AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (19.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

CUDA’s Mapped Memory to Support I/O Functions on GPU

Wei Wu( )Fengbin QiWangquan HeShanshan Wang
Jiangnan Institute of Computing Technology, Wuxi 214083, China
Show Author Information

Abstract

The API interfaces provided by CUDA help programmers to get high performance CUDA applications in GPU, but they cannot support most I/O operations in device codes. The characteristics of CUDA’s mapped memory are used here to create a dynamic polling service model in the host which can satisfy most I/O functions such as read/write file and "printf". The technique to implement these I/O functions has some influence on the performance of the original applications. These functions quickly respond to the users’ I/O requirements with the "printf" performance better than CUDA’s. An easy and effective real-time method is given for users to debug their programs using the I/O functions. These functions improve productivity of converting legacy C/C++ codes to CUDA and broaden CUDA’s functions.

References

[1]
S. Zhang, Y. Zhu, K. Zhao, and Y. Zhang, GPU High Performance Computing with CUDA, (in Chinese). Beijing, China: China WaterPower Press, 2009.
[2]
NVIDIA Corporation, CUDA Toolkit 3.1 Downloads, https://developer.nvidia.com/cuda-toolkit-31-downloads, 2010.
[3]
S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: A compiler framework for automatic translation and optimization, presented at the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, 2009.
[4]
R. Dolbeau, S. Bihan, and F. Bodin, HMPP?: A hybrid multi-core parallel programming environment, presented at the 1st Workshop on General Purpose Processing on Graphics Processing Units, Boston, USA, 2007.
[5]
Khronos Group, The open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl, 2011.
[6]
Andreas Kloeckner, Welcome to PyCUDAs documentation, http://documen.tician.de/pycuda, 2013.
[7]
Python Software Foundation, The Python Tutorial, http://docs.python.org/3/tutorial/index.html, 2013.
[8]
J. Breitbart, Cupp-A framework for easy CUDA integration, in Proc. the 2009 IEEE International Symposium on Parallel&Distributed Processing, Washington, DC, USA, 2009, pp. 1-8.
[9]
NVIDIA Corporation, CUDA Toolkit 4.0, https://developer.nvidia.com/cuda-toolkit-40, 2011.
[10]
D. B. Kirk and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach. Burlington, MA, USA: Morgan Kaufmann Publishers, 2010.
[11]
NVIDIA Corporation, Fermi architecture for high-performance computing, http://www.nvidia.com/object/fermi-architecture.html, 2013.
[12]
M. van Vuuren and E. Winands, Iterative approximation of k-limited polling systems, Queueing Systems, vol. 55, no. 3, pp. 161-178, 2007.
[13]
J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming. Boston, MA, USA: Addison-Wesley, 2010.
[14]
G. Diamos, A. Kerr, and S. Yalamanchili, Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems, presented at the 19th International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, 2010.
[15]
Ken Domino, CUDA-waste: Why another simple trivial emulator for CUDA, http://code.google.com/p/cuda-waste, 2010.
[16]
C. Muller, S. Frey, M. Strengert, C. Dachsbacher, and T. Ertl, A compute unified system architecture for graphics clusters incorporating data locality, IEEE Transaction on Visualization and Computer Graphics, vol. 15, no. 4, pp. 605-617, July/August 2009.
[17]
H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi, CheCUDA: A checkpoint/restart tool for CUDA applications, presented at the 10th International Conference on Parallel and Distributed Computing, Applications and Technologies, Higashi Hiroshima, Japan, 2009.
Tsinghua Science and Technology
Pages 588-598
Cite this article:
Wu W, Qi F, He W, et al. CUDA’s Mapped Memory to Support I/O Functions on GPU. Tsinghua Science and Technology, 2013, 18(6): 588-598. https://doi.org/10.1109/TST.2013.6678904

460

Views

26

Downloads

0

Crossref

N/A

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 16 July 2012
Revised: 04 January 2013
Accepted: 10 May 2013
Published: 06 December 2013
© The author(s) 2013
Return