AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Publishing Language: Chinese

Efficient memory allocator for the New Generation Sunway supercomputer

Haojie WANGZixuan MALiyan ZHENGYuanwei WANGFei WANGJidong ZHAI( )
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Supercomputers provide enormous computing power for large applications. Traditional supercomputers have mainly targeted scientific computing problems. However, other applications have new requirements for the both supercomputer software and hardware designs. The New Generation Sunway supercomputer has an inefficient memory allocator when running in the dynamic mode. This study develops an efficient memory allocator, SWAlloc, that reduces the memory allocation time of the brain scale pretrained model training framework, BaGuaLu, by up to 75 839 times. Evaluations using PARSEC also show that SWAlloc can speed up the memory allocation by up to 51 times (36% on average). SWAlloc has been deployed on the New Generation Sunway supercomputer for use by various large applications, including SWPytorch and SWTensorFlow.

CLC number: TP316 Document code: A Article ID: 1000-0054(2022)05-0943-09

References

[1]
KURTH T, TREICHLER S, ROMERO J, et al. Exascale deep learning for climate analytics [C]// SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, USA, 2018: 649-660.
[2]
LIN H, ZHU X W, YU B W, et al. ShenTu: Processing multi-trillion edge graphs on millions of cores in seconds [C]// SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, USA, 2018: 706-716.
[3]

FU H H, LIAO J F, YANG J Z, et al. The Sunway TaihuLight supercomputer: System and applications [J]. Science China Information Sciences, 2016, 59(7): 072001.

[4]
BIENIA C, KUMAR S, SINGH J P, et al. The PARSEC benchmark suite: Characterization and architectural implications [C]// 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). Toronto, Canada, 2008: 72-81.
[5]

KNOWLTON K C. A fast storage allocator [J]. Communications of the ACM, 1965, 8(10): 623-624.

[6]

VON PUTTKAMER E. A simple hardware buddy system memory allocator [J]. IEEE Transactions on Computers, 1975, 24(10): 953-957.

[7]

BRYANT R E, O'HALLARON D R. Computer systems: A programmer's perspective [M]. Upper Saddle River, USA: Prentice Hall, 2003.

[8]
BONWICK J. The slab allocator: An object-caching kernel memory allocator [C]// USENIX Summer 1994 Technical Conference. Boston, USA, 1994: 87-98.
[9]

AL-YATAMA A, AHMAD I, AL-DABBOUS N. Memory allocation algorithm for cloud services [J]. The Journal of Supercomputing, 2017, 73(11): 5006-5033.

[10]
KHALED H. Enhancing recursive brute force algorithm with static memory allocation: Solving motif finding problem as a case study [C]// 2019 14th International Conference on Computer Engineering and Systems (ICCES). Cairo, Egypt, 2019: 66-70.
[11]
PUPYKINA A, AGOSTA G. Optimizing memory management in deeply heterogeneous HPC accelerators [C]// 2017 46th International Conference on Parallel Processing Workshops (ICPPW). Bristol, UK, 2017: 291-300.
[12]

ZENG F Y, SANG N, XIONG G Z. Study on memory management scheme of embedded systems [J]. Microcontrollers & Embedded Systems, 2005(1): 5-7. (in Chinese)

[13]

SONG M C, LI S B. A new embedded dynamic memory allocation algorithm [J]. Journal of Computer Application, 2017, 37(S2): 244-247, 254. (in Chinese)

[14]

GAO K, CHEN L C, FAN D R, et al. Shared memory resources allocation and management research on multicore systems [J]. Chinese Journal of Computers, 2015, 38(5): 1020-1034. (in Chinese)

[15]

LI T, LI H, GU J H, et al. Study of concurrency programming pattern and pooled memory allocation using ACE [J]. Computer Engineering and Design, 2006, 27(1): 26-28. (in Chinese)

[16]

WEI H T, JIANG Y M, LI J W, et al. Research of high efficient implementation of memory management mechanism [J]. Computer Engineering and Design, 2009, 30(16): 3708-3712. (in Chinese)

[17]

YANG L, WU Y, CHEN W B. The actualization of dynamic and static memery management in RTOS [J]. Microcomputer Information, 2005, 21(19): 15-16, 101. (in Chinese)

[18]

XIE C S, LIU Z B. Research on Linux memory management [J]. Application Research of Computers, 2005(3): 58-60. (in Chinese)

[19]

DU J, QIAN Y R, ZHANG M, et al. Hybrid-memory page management strategy based on write page popularity [J]. Journal of Northeast Normal University (Natural Science Edition), 2021, 53(2): 53-59. (in Chinese)

[20]

ZHANG F, ZHAI J D, CHEN Z, et al. Survey on performance analysis, optimization, and applications of heterogeneous fusion processors [J]. Journal of Software, 2020, 31(8): 2603-2624. (in Chinese)

[21]

DU X Y, LU W, ZHANG F. History, present, and future of big data management systems [J]. Journal of Software, 2019, 30(1): 127-141. (in Chinese)

[22]

WALKER D W, DONGARRA J J. MPI: A standard message passing interface [J]. Supercomputer, 1996, 12(1): 56-68.

Journal of Tsinghua University (Science and Technology)
Pages 943-951
Cite this article:
WANG H, MA Z, ZHENG L, et al. Efficient memory allocator for the New Generation Sunway supercomputer. Journal of Tsinghua University (Science and Technology), 2022, 62(5): 943-951. https://doi.org/10.16511/j.cnki.qhdxxb.2022.22.007

128

Views

2

Downloads

0

Crossref

0

Scopus

0

CSCD

Altmetrics

Received: 09 September 2021
Published: 15 May 2022
© Journal of Tsinghua University (Science and Technology). All rights reserved.
Return