[1]
S. Zhang, Y. Zhu, K. Zhao, and Y. Zhang, GPU High Performance Computing with CUDA, (in Chinese). Beijing, China: China WaterPower Press, 2009.
[3]
S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: A compiler framework for automatic translation and optimization, presented at the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, 2009.
[4]
R. Dolbeau, S. Bihan, and F. Bodin, HMPP?: A hybrid multi-core parallel programming environment, presented at the 1st Workshop on General Purpose Processing on Graphics Processing Units, Boston, USA, 2007.
[8]
J. Breitbart, Cupp-A framework for easy CUDA integration, in Proc. the 2009 IEEE International Symposium on Parallel&Distributed Processing, Washington, DC, USA, 2009, pp. 1-8.
[10]
D. B. Kirk and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach. Burlington, MA, USA: Morgan Kaufmann Publishers, 2010.
[12]
M. van Vuuren and E. Winands, Iterative approximation of k-limited polling systems, Queueing Systems, vol. 55, no. 3, pp. 161-178, 2007.
[13]
J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming. Boston, MA, USA: Addison-Wesley, 2010.
[14]
G. Diamos, A. Kerr, and S. Yalamanchili, Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems, presented at the 19th International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, 2010.
[16]
C. Muller, S. Frey, M. Strengert, C. Dachsbacher, and T. Ertl, A compute unified system architecture for graphics clusters incorporating data locality, IEEE Transaction on Visualization and Computer Graphics, vol. 15, no. 4, pp. 605-617, July/August 2009.
[17]
H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi, CheCUDA: A checkpoint/restart tool for CUDA applications, presented at the 10th International Conference on Parallel and Distributed Computing, Applications and Technologies, Higashi Hiroshima, Japan, 2009.