AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

Department of Electrical Engineering and Computer Science, University of California Merced, Merced 95343, U.S.A.

A preliminary version of the paper was published in the proceedings of SC 2017.

Show Author Information

Abstract

Non-volatile memory (NVM) provides a scalable and power-efficient solution to replace dynamic random access memory (DRAM) as main memory. However, because of the relatively high latency and low bandwidth of NVM, NVM is often paired with DRAM to build a heterogeneous memory system (HMS). As a result, data objects of the application must be carefully placed to NVM and DRAM for the best performance. In this paper, we introduce a lightweight runtime solution that automatically and transparently manages data placement on HMS without the requirement of hardware modifications and disruptive change to applications. Leveraging online profiling and performance models, the runtime solution characterizes memory access patterns associated with data objects, and minimizes unnecessary data movement. Our runtime solution effectively bridges the performance gap between NVM and DRAM. We demonstrate that using NVM to replace the majority of DRAM can be a feasible solution for future HPC systems with the assistance of a software-based data management.

Electronic Supplementary Material

Download File(s)
jcst-36-1-90-Highlights.pdf (714 KB)

References

[1]
Dulloor S R, Roy A, Zhao Z G et al. Data tiering in heterogeneous memory systems. In Proc. the 11th European Conference on Computer Systems, April 2016, Article No. 15. DOI: 10.1145/2901318.2901344.
[2]
Giardino M, Doshi K, Ferri B. Soft2LM: Application guided heterogeneous memory management. In Proc. the 2016 International Conference on Networking, Architecture, and Storage, Aug. 2016. DOI: 10.1109/NAS.2016.7549421.
[3]
Lin F X, Liu X. memif: Towards programming heterogeneous memory asynchronously. In Proc. the 21st International Conference on Architectural Support for Programming Languages and Operating Systems, March 2016, pp.369-383. DOI: 10.1145/2980024.2872401.
[4]
Shen D, Liu X, Lin F X. Characterizing emerging heterogeneous memory. In Proc. the 2016 ACM SIGPLAN International Symposium on Memory Management, June 2016, pp.13-23. DOI: 10.1145/2926697.2926702.
[5]
Wang B, Wu B, Li D, Shen X, Yu W, Jiao Y, Vetter J S. Exploring hybrid memory for GPU energy efficiency through software-hardware co-design. In Proc. the 22nd International Conference on Parallel Architectures and Compilation Techniques, Sept. 2013, pp.93-102. DOI: 10.1109/PACT.2013.6618807.
[6]
Wu K, Ren J, Li D. Runtime data management on nonvolatile memory-based heterogeneous memory for task-parallel programs. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 31. DOI: 10.1109/SC.2018.00034.
[7]
Wu P, Li D, Chen Z, Vetter J, Mittal S. Algorithm-directed data placement in explicitly managed no-volatile memory. In Proc. the 25th ACM Symposium on High-Performance Parallel and Distributed Computing, May 2016, pp.141-152. DOI: 10.1145/2907294.2907321.
[8]
Qureshi M K, Franchescini M, Srinivasan V, Lastras L, Abali B, Karidis J. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proc. the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009, pp.14-23. DOI: 10.1145/1669112.1669117.
[9]
Qureshi M K, Srinivasan V, Rivers J A. Scalable high-performance main memory system using phase-change memory technology. In Proc. the 36th International Symposium on Computer Architecture, June 2009, pp.24-33. 10.1145/1555754.1555760.
[10]
Yoon H, Meza J, Ausavarungnirun R, Harding R, Mutlu O. Row buffer locality aware caching policies for hybrid memories. In Proc. the 30th IEEE International Conference on Computer Design, Sept. 30–Oct. 3, 2012, pp.337-344. DOI: 10.1109/ICCD.2012.6378661.
[11]
Wu K, Huang Y, Li D. Unimem: Runtime data management on non-volatile memory-based heterogeneous main memory. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 58. DOI: 10.1145/3126908.3126923.
[12]
Bailey D H, Barszcz E, Dagum L, Simon H D. Nas parallel benchmark results. In Proc. the 1992 ACM/IEEE Conference on Supercomputing, Nov. 1992, pp.386-393. DOI: 10.1109/SUPERC.1992.236665.
[13]
Izraelevitz J, Yang J, Zhang L et al. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv: 1903.05714, 2019. https://arxiv.org/pdf/1903.05714v3.pdf, October 2020.
[14]
Suzuki K, Swanson S. The non-volatile memory technology database (NVMDB). Technical Report, Department of Computer Science & Engineering, University of California, 2015. http://cseweb.ucsd.edu/~swanson/papers/TR2015-NVMDB.pdf, Oct. 2020.
[15]
Volos H, Magalhaes G, Cherkasova L, Li J. Quartz: A lightweight performance emulator for persistent memory software. In Proc. the 16th Annual Middleware Conference, November 2015, pp.37-49. DOI: 10.1145/2814576.2814806.
[16]
Li D, Vetter J, Marin G, McCurdy C, Cira C, Liu Z, Yu W. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proc. the 26th International Parallel and Distributed Processing Symposium, May 2012, pp.945-956. DOI: 10.1109/IPDPS.2012.89.
[17]
Silvano M, Toth P. Knapsack Problems: Algorithms and Computer Implementations (1st edition). John Wiley & Sons, 1990.
[18]
Agarwal N, Nellans D, Stephenson M, O’Connor M, Keckler S W. Page placement strategies for GPUs within heterogeneous memory systems. In Proc. the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2015, pp.607-618. DOI: 10.1145/2775054.2694381.
[19]
Ding C, Kennedy K. Bandwidth-based performance tuning and prediction. In Proc. the 1990 IASTED International Conference on Parallel Computing and Distributed Systems, November 1999.
[20]
Berger E D, McKinley K S, Blumofe R D, Wilson P R. Hoard: A scalable memory allocator for multithreaded applications. In Proc. the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000, pp.117-128. DOI: 10.1145/378993.379232.
[21]
Michael M M. Scalable lock-free dynamic memory allocation. In Proc. the 2004 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004, pp.35-46. DOI: 10.1145/996893.996848.
[22]
Lattner C. LLVM: An infrastructure for multi-stage optimization [Ph.D. Thesis]. Computer Science Dept., Univ. of Illinois at Urbana-Champaign, 2002.
[23]
Chakaravarthy V T. New results on the computability and complexity of points-to analysis. In Proc. the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2003, pp.115-125. DOI: 10.1145/640128.604142.
[24]
Volos H, Tack A J, Swift M M. Mnemosyne: Lightweight persistent memory. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2011, pp.91-104. DOI: 10.1145/2248487.1950379.
[25]
Wen S, Cherkasova L, Lin F X, Liu X. ProfDP: A lightweight profiler to guide data placement in heterogeneous memory systems. In Proc. the 2018 International Conference on Supercomputing, June 2018, pp.263-273. DOI: 10.1145/3205289.3205320.
[26]
Lachaize R, Lepers B, Quéma V. MemProf: A memory profiler for NUMA multicore systems. In Proc. the 2012 USENIX Annual Technical Conference, June 2012, pp.53-64.
[27]
Liu X, Mellor-Crummey J. A data-centric profiler for parallel programs. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2013, Article No. 28. DOI: 10.1145/2503210.2503297.
[28]
Liu X, Wu B. ScaAnalyzer: A tool to identify memory scalability bottlenecks in parallel programs. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2015, Article No. 47. DOI: 10.1145/2807591.2807648.
[29]
Liu X, Mellor-Crummey J. Pinpointing data locality problems using data-centric analysis. In Proc. the 9th International Symposium on Code Generation and Optimization, April 2011, pp.171-180. DOI: 10.1109/CGO.2011.5764685.
[30]
McCurdy C, Vetter J. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms. In Proc. the 2010 IEEE International Symposium on Performance Analysis of Systems Software, March 2010, pp.87-96. DOI: 10.1109/ISPASS.2010.5452060.
[31]
Chen Y, Peng I B, Peng Z, Liu X, Ren B. ATMem: Adaptive data placement in graph applications on heterogeneous memories. In Proc. the 18th ACM/IEEE International Symposium on Code Generation and Optimization, February 2020, pp.293-304. DOI: 10.1145/3368826.3377922.
[32]
Bivens A, Dube P, Franceschini M, Karidis J, Lastras L, Tsao M. Architectural design for next generation heterogeneous memory systems. In Proc. the 2010 International Memory Workshop, May 2010. DOI: 10.1109/IMW.2010.5488395.
Journal of Computer Science and Technology
Pages 90-109
Cite this article:
Wu K, Li D. Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing. Journal of Computer Science and Technology, 2021, 36(1): 90-109. https://doi.org/10.1007/s11390-020-0942-z

409

Views

2

Crossref

2

Web of Science

3

Scopus

0

CSCD

Altmetrics

Received: 25 August 2020
Accepted: 30 December 2020
Published: 05 January 2021
© Institute of Computing Technology, Chinese Academy of Sciences 2021
Return