CUDA’s Mapped Memory to Support I/O Functions on GPU

Wei Wu; Fengbin Qi; Wangquan He; Shanshan Wang

doi:10.1109/TST.2013.6678904

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (19.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

CUDA’s Mapped Memory to Support I/O Functions on GPU

Wei Wu(

), Fengbin Qi, Wangquan He, Shanshan Wang

Jiangnan Institute of Computing Technology, Wuxi 214083, China

Show Author Information

Abstract

The API interfaces provided by CUDA help programmers to get high performance CUDA applications in GPU, but they cannot support most I/O operations in device codes. The characteristics of CUDA’s mapped memory are used here to create a dynamic polling service model in the host which can satisfy most I/O functions such as read/write file and "printf". The technique to implement these I/O functions has some influence on the performance of the original applications. These functions quickly respond to the users’ I/O requirements with the "printf" performance better than CUDA’s. An easy and effective real-time method is given for users to debug their programs using the I/O functions. These functions improve productivity of converting legacy C/C++ codes to CUDA and broaden CUDA’s functions.

Keywords

CUDA I/O functions mapped memory dynamic polling service model

References

[1]

S. Zhang, Y. Zhu, K. Zhao, and Y. Zhang, GPU High Performance Computing with CUDA, (in Chinese). Beijing, China: China WaterPower Press, 2009.

[2]

NVIDIA Corporation, CUDA Toolkit 3.1 Downloads, https://developer.nvidia.com/cuda-toolkit-31-downloads, 2010.

[3]

S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: A compiler framework for automatic translation and optimization, presented at the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, 2009.

Crossref

[4]

R. Dolbeau, S. Bihan, and F. Bodin, HMPP?: A hybrid multi-core parallel programming environment, presented at the 1st Workshop on General Purpose Processing on Graphics Processing Units, Boston, USA, 2007.

[5]

Khronos Group, The open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl, 2011.

Crossref

[6]

Andreas Kloeckner, Welcome to PyCUDAs documentation, http://documen.tician.de/pycuda, 2013.

Crossref

[7]

Python Software Foundation, The Python Tutorial, http://docs.python.org/3/tutorial/index.html, 2013.

[8]

J. Breitbart, Cupp-A framework for easy CUDA integration, in Proc. the 2009 IEEE International Symposium on Parallel&Distributed Processing, Washington, DC, USA, 2009, pp. 1-8.

[9]

NVIDIA Corporation, CUDA Toolkit 4.0, https://developer.nvidia.com/cuda-toolkit-40, 2011.

[10]

D. B. Kirk and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach. Burlington, MA, USA: Morgan Kaufmann Publishers, 2010.

[11]

NVIDIA Corporation, Fermi architecture for high-performance computing, http://www.nvidia.com/object/fermi-architecture.html, 2013.

Crossref

[12]

M. van Vuuren and E. Winands, Iterative approximation of k-limited polling systems, Queueing Systems, vol. 55, no. 3, pp. 161-178, 2007.

Crossref Google Scholar

[13]

J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming. Boston, MA, USA: Addison-Wesley, 2010.

[14]

G. Diamos, A. Kerr, and S. Yalamanchili, Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems, presented at the 19th International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, 2010.

[15]

Ken Domino, CUDA-waste: Why another simple trivial emulator for CUDA, http://code.google.com/p/cuda-waste, 2010.

Crossref

[16]

C. Muller, S. Frey, M. Strengert, C. Dachsbacher, and T. Ertl, A compute unified system architecture for graphics clusters incorporating data locality, IEEE Transaction on Visualization and Computer Graphics, vol. 15, no. 4, pp. 605-617, July/August 2009.

Google Scholar

[17]

H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi, CheCUDA: A checkpoint/restart tool for CUDA applications, presented at the 10th International Conference on Parallel and Distributed Computing, Applications and Technologies, Higashi Hiroshima, Japan, 2009.

Tsinghua Science and Technology

Volume 18 Issue 6,
December 2013

Pages 588-598

DOI: 10.1109/TST.2013.6678904

Cite this article:

Wu W, Qi F, He W, et al. CUDA’s Mapped Memory to Support I/O Functions on GPU. Tsinghua Science and Technology, 2013, 18(6): 588-598. https://doi.org/10.1109/TST.2013.6678904

460

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 16 July 2012

Revised: 04 January 2013

Accepted: 10 May 2013

Published: 06 December 2013