AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

10-Million Atoms Simulation of First-Principle Package LS3DF

State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190 China
University of Chinese Academy of Sciences, Beijing 101408, China
Computing System Optimization Laboratory, Huawei Technologies, Beijing 100094, China
Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
School of Materials Science and Engineering, Yancheng Institute of Technology, Yancheng 224051, China

Hai-Bo Li is responsible for algorithm design and participated in paper writing; Wei-Le Jia is responsible for the overall design and guidance of the paper work, and algorithmic optimization; Ning-Hui Sun is the chief instructor of the work and responsible for system optimization.

Show Author Information

Abstract

The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on the Sugon supercomputer, a domestic supercomputer equipped with deep computing units. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.

Electronic Supplementary Material

Video
JCST-3011-Video.mp4
Download File(s)
JCST-2212-13011-Highlights.pdf (136.1 KB)

References

[1]

Naveh Y, Likharev K K. Shrinking limits of silicon MOSFETs: Numerical study of 10 nm scale devices. Superlattices and Microstructures, 2000, 27(2/3): 111–123. DOI: 10. 1006/spmi.1999.0807.

[2]
Ravaioli U. Quantum phenomena in semiconductor nanostructures. In Encyclopedia of Complexity and Systems Science, Meyers R A (ed.), Springer, 2009, pp.7400–7422. DOI: 10.1007/978-0-387-30440-3_439.
[3]

Kohn W, Sham L J. Self-consistent equations including exchange and correlation effects. Physical Review, 1965, 140(4A): A1133–A1138. DOI: 10.1103/PhysRev.140.A1133.

[4]

Payne M C, Teter M P, Allan D C, Arias T A, Joannopoulos J D. Iterative minimization techniques for ab initio total-energy calculations: Molecular dynamics and conjugate gradients. Reviews of Modern Physics, 1992, 64(4): 1045–1097. DOI: 10.1103/RevMod-Phys.64.1045.

[5]

Kresse G, Furthmüller J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Physical Review B, 1996, 54(16): 11169–11186. DOI: 10.1103/PhysRevB.54.11169.

[6]

Tsuchida E, Tsukada M. Electronic-structure calculations based on the finite-element method. Physical Review B, 1995, 52(8): 5573–5578. DOI: 10.1103/PhysRevB.52.5573.

[7]

Suryanarayana P, Gavini V, Blesgen T, Bhattacharya K, Ortiz M. Non-periodic finite-element formulation of Kohn–Sham density functional theory. Journal of the Mechanics and Physics of Solids, 2010, 58(2): 256–280. DOI: 10.1016/j.jmps.2009.10.002.

[8]

Bao G, Hu G H, Liu D. An h-adaptive finite element solver for the calculations of the electronic structures. Journal of Computational Physics, 2012, 231(14): 4967–4979. DOI: 10.1016/j.jcp.2012.04.002.

[9]
Chen H J, Dai X Y, Gong X G, He L H, Zhou A H. Adaptive finite element approximations for Kohn–Sham models. Multiscale Modeling & Simulation, 2014, 12(4): 1828–1869. DOI: 10.1137/130916096.
[10]
Das S, Motamarri P, Gavini V, Turcksin B, Li Y W, Leback B. Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system. In Proc. the 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2019, Article No. 2. DOI: 10.1145/3295500.3357157.
[11]
Gygi F, Draeger E W, Schulz M, de Supinski B R, Gunnels J A, Austel V, Sexton J C, Franchetti F, Kral S, Ueberhuber C W, Lorenz J. Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform. In Proc. the 2006 ACM/IEEE Conference on Supercomputing, Nov. 2006, Article No. 45. DOI: 10.1145/1188455.1188502.
[12]
Wang L W, Lee B, Shan H Z, Zhao Z J, Meza J, Strohmaier E, Bailey D H. Linearly scaling 3D fragment method for large-scale electronic structure calculations. In Proc. the 2008 ACM/IEEE Conference on Supercomputing, Nov. 2008. DOI: 10.1109/SC.2008.5218327.
[13]
Hasegawa Y, Iwata J, Tsuji M, Takahashi D, Oshiyama A, Minami K, Boku T, Shoji F, Uno A, Kurokawa M, Inoue H, Miyoshi I, Yokokawa M. First-principles calculations of electron states of a silicon nanowire with 100000 atoms on the K computer. In Proc. the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2008, Article No. 1. DOI: 10.1145/2063384.2063386.
[14]

Nakata A, Baker J S, Mujahed S Y, Poulton J T L, Arapan S, Lin J B, Raza Z, Yadav S, Truflandier L, Miyazaki T, Bowler D R. Large scale and linear scaling DFT with the CONQUEST code. The Journal of Chemical Physics, 2020, 152(16): 164112. DOI: 10.1063/5.0005074.

[15]

Kühne T D, Iannuzzi M, Del Ben M, Rybkin V V, Seewald P, Stein F, Laino T, Khaliullin R Z, Schütt O, Schiffmann F, Golze D, Wilhelm J, Chulkov S, Bani-Hashemian M H, Weber V, Borštnik U, Taillefumier M, Jakobovits A S, Lazzaro A, Pabst H, Müller T, Schade R, Guidon M, Andermatt S, Holmberg N, Schenter G K, Hehn A, Bussy A, Belleflamme F, Tabacchi G, Glöß A, Lass M, Bethune I, Mundy C J, Plessl C, Watkins M, Vandevondele J, Krack M, Hutter J. CP2K: An electronic structure and molecular dynamics software package-quickstep: Efficient and accurate electronic structure calculations. The Journal of Chemical Physics, 2020, 152(19): 194103. DOI: 10.1063/5.0007045.

[16]
Shang H H, Li F, Zhang Y Q, Zhang L B, Fu Y, Gao Y X, Wu Y J, Duan X H, Lin R F, Liu X, Liu Y, Chen D X. Extreme-scale ab initio quantum Raman spectra simulations on the leadership HPC system in China. In Proc. the 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2021, Article No. 6. DOI: 10.1145/3458817.3487402.
[17]

Hu W, Qin X M, Jiang Q C, Chen J S, An H, Jia W L, Li F, Liu X, Chen D X, Liu F F, Zhao Y W, Yang J L. High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight. Science Bulletin, 2021, 66(2): 111–119. DOI: 10.1016/j.scib.2020.06.025.

[18]

Schade R, Kenter T, Elgabarty H, Lass M, Schütt O, Lazzaro A, Pabst H, Mohr S, Hutter J, Kühne T D, Plessl C. Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms. Parallel Computing, 2022, 111: 102920. DOI: 10.1016/j.parco.2022.102920.

[19]
Hu W, An H, Guo Z Q, Jiang Q C, Qin X M, Chen J S, Jia W L, Yang C, Luo Z L, Li J L, Wu W T, Tan G M, Jia D N, Lu Q L, Liu F F, Tian M, Li F, Huang Y Q, Wang L Y, Liu S, Yang J L. 2.5 million-atom ab initio electronic-structure simulation of complex metallic heterostructures with DGDFT. In Proc. the 2022 International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2022, Article No. 5. DOI: 10.1109/SC41404.2022.00010.
[20]

Goedecker S. Linear scaling electronic structure methods. Reviews of Modern Physics, 1999, 71(4): 1085–1123. DOI: 10.1103/RevModPhys.71.1085.

[21]

Lin L, Lu J F, Car R, E W N. Multipole representation of the Fermi operator with application to the electronic structure analysis of metallic systems. Physical Review B, 2009, 79(11): 115133. DOI: 10.1103/PhysRevB.79.115133.

[22]

Bowler D R, Miyazaki T. O( N) methods in electronic structure calculations. Reports on Progress in Physics, 2012, 75(3): 036503. DOI: 10.1088/0034-4885/75/3/036503.

[23]

Wang L W, Zhao Z J, Meza J. Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations. Physical Review B, 2008, 77(16): 165113. DOI: 10.1103/PhysRevB.77.165113.

[24]
Ye M, Jiang X W, Li S S, Wang L W. Large-scale ab initio quantum transport simulation of nanosized copper interconnects: The effects of defects and quantum interferences. In Proc. the 2019 IEEE International Electron Devices Meeting (IEDM), Dec. 2019, Article No. 24. DOI: 10.1109/IEDM19573.2019.8993549.
[25]
Wang L W, Jia W L, Cao Z Y, Wang L, Chi X B, Gao W G. GPU speedup of the plane wave pseudopotential density functional theory calculations. In APS March Meeting Abstracts, Feb. 27–March 2, 2012, Abstract ID T7.008.
[26]

Tomo S, Langou J, Dongarra J, Canning A, Wang L W. Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures. International Journal of Computational Science and Engineering, 2006, 2(3/4): 205–212. DOI: 10.1504/IJCSE.2006.012774.

[27]

Kohn W. Density functional and density matrix method scaling linearly with the number of atoms. Physical Review Letters, 1996, 76(17): 3168–3171. DOI: 10.1103/PhysRevLett.76.3168.

[28]

Auckenthaler T, Blum V, Bungartz H J, Huckle T, Johanni R, Krämer L, Lang B, Lederer H, Willems P R. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Computing, 2011, 37(12): 783–794. DOI: 10.1016/j.parco.2011.05.002.

[29]

Yang C, Meza J C, Wang L W. A trust region direct constrained minimization algorithm for the Kohn–Sham equation. SIAM Journal on Scientific Computing, 2007, 29(5): 1854–1875. DOI: 10.1137/060661442.

[30]

Vecharynski E, Yang C, Pask J E. A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix. Journal of Computational Physics, 2015, 290: 73–89. DOI: 10.1016/j.jcp.2015.02.030.

[31]

Knyazev A V. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM Journal on Scientific Computing, 2001, 23(2): 517–541. DOI: 10.1137/S1064827500366124.

[32]

Jia W L, Cao Z Y, Wang L, Fu J Y, Chi X B, Gao W G, Wang L W. The analysis of a plane wave pseudopotential density functional theory code on a GPU machine. Computer Physics Communications, 2013, 184(1): 9–18. DOI: 10.1016/j.cpc.2012.08.002.

[33]
Fattebert J L, Osei-Kuffuor D, Draeger E W, Ogitsu T, Krauss W D. Modeling dilute solutions using first-principles molecular dynamics: Computing more than a million atoms with over a million cores. In Proc. the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2016, pp.12–22. DOI: 10.1109/SC.2016.88.
[34]
Higham N J. Accuracy and Stability of Numerical Algorithms. SIAM, 2002.
[35]
Sun J G. Matrix Perturbation Analysis (2nd edition). Science Press, 2001. (in Chinese)
[36]

Hohenberg P, Kohn W. Inhomogeneous electron gas. Physical Review, 1964, 136(3B): B864–B871. DOI: 10.1103/ PhysRev.136.B864.

[37]
Gabriel E, Fagg G, Bosilca G et al. Open MPI: Goals, concept, design of a next generation MPI implementation. In Proc. the 11th European PVM/MPI Users' Group Meeting, Sept. 2004, pp.97–104. DOI: 10.1109/CLUSTR.2006.311904.
[38]

Van Zee F G, van de Geijn R A. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Trans. Mathematical Software, 2015, 41(3): Article No. 14. DOI: 10.1145/2764 454.

[39]
Anderson E, Bai Z, Bischof C, Blackford L S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D. LAPACK Users’ Guide (3rd edition). Society for Industrial and Applied Mathematics, 1999.
[40]
Blackford L S, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley R C. ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, 1997.
[41]

Bosma W, Cannon J, Playoust C. The Magma algebra system I: The user language. Journal of Symbolic Computation, 1997, 24(3/4): 235–265. DOI: 10.1006/jsco.1996.0125.

Journal of Computer Science and Technology
Pages 45-62
Cite this article:
Yan Y-J, Li H-B, Zhao T, et al. 10-Million Atoms Simulation of First-Principle Package LS3DF. Journal of Computer Science and Technology, 2024, 39(1): 45-62. https://doi.org/10.1007/s11390-023-3011-6

184

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 21 February 2023
Accepted: 25 April 2023
Published: 25 January 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return