AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Design and Implementation of the Tianhe-2 Data Storage and Management System

National Supercomputer Center in Guangzhou, Sun Yat-sen University, Guangzhou 510000, China
College of Computer, National University of Defense Technology, Changsha 410073, China
Show Author Information

Abstract

With the convergence of high-performance computing (HPC), big data and artificial intelligence (AI), the HPC community is pushing for “triple use” systems to expedite scientific discoveries. However, supporting these converged applications on HPC systems presents formidable challenges in terms of storage and data management due to the explosive growth of scientific data and the fundamental differences in I/O characteristics among HPC, big data and AI workloads. In this paper, we discuss the driving force behind the converging trend, highlight three data management challenges, and summarize our efforts in addressing these data management challenges on a typical HPC system at the parallel file system, data management middleware, and user application levels. As HPC systems are approaching the border of exascale computing, this paper sheds light on how to enable application-driven data management as a preliminary step toward the deep convergence of exascale computing ecosystems, big data, and AI.

Electronic Supplementary Material

Download File(s)
jcst-35-1-27-Highlights.pdf (530 KB)

References

[1]
Zhang Z, Barbary K, Nothaft F et al. Scientific computing meets big data technology: An astronomy use case. In Proc. the 2015 IEEE International Conference on Big Data, October 29–November 1, 2015, pp.918-927.
[2]
Yang X, Liu N, Feng B, Sun X H, Zhou S. PortHadoop: Support direct HPC data processing in Hadoop. In Proc. the 2015 IEEE International Conference on Big Data, October 29–November 1, 2015, pp.223-232.
[3]

Klein M, Sharma R, Bohrer C, Avelis C, Roberts E. Biospark: Scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark. Bioinformatics, 2017, 33(2): 303-305.

[4]
Usman S, Mehmood R, Katib I. Big data and HPC convergence: The cutting edge and outlook. In Proc. the 1st International Conference on Smart Societies, Infrastructure, Technologies and Applications, November 2017, pp.11-26.
[5]
Kurth T, Treichler S, Romero J et al. Exascale deep learning for climate analytics. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 51.
[6]

Song F G, Dongarra J J. A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurrency and Computation: Practice and Experience, 2015, 27(14): 3702-3723.

[7]

Karp R M, Zhang Y J. Randomized parallel algorithms for backtrack search and branch-and-bound computation. J. ACM, 1993, 40(3): 765-789.

[8]
Schwan P. Lustre: Building a file system for 1,000-node clusters. In Proc. the 2013 Linux Symposium, July 2003, pp.380-386.
[9]
Li J W, Liao W K, Choudhary A N et al. Parallel netCDF: A high-performance scientific I/O interface. In Proc. the 2003 ACM/IEEE Conference on High Performance Networking and Computing, November 2003, Article No. 39.
[10]
Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. In Proc. the 26th IEEE Symposium on Mass Storage Systems and Technologies, May 2010, Article No. 9.
[11]
Barisits M, Beermann T, Berghaus F et al. Rucio — Scientific data management. arXiv: 1902.09857, 2019. https://arxiv.org/abs/1902.09857, Oct. 2019.
[12]

Narasimhamurthy S, Danilov N, Wu S, Umanesan G, Markidis S, Gomez S R, Peng I B, Laure E, Pleiter D, Witt S D. SAGE: Percipient storage for exascale data centric computing. Parallel Computing, 2019, 83: 22-33.

[13]
Sewell C M, Heitmann K, Finkel H et al. Large-scale compute-intensive analysis via a combined in-situ and coscheduling workflow approach. In Proc. the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 50.
[14]

Miyoshi T, Lien G Y, Satoh S et al. “Big data assimilation” toward post-petascale severe weather prediction: An overview and progress. Proceedings of the IEEE, 2016, 104(11): 2155-2179.

[15]
Bhimji W, Bard D, Romanus M. Accelerating science with the NERSC burst buffer early user program. In Proc. the 2016 Cray User Group Meeting, May 2016.
[16]
Kakoulli E, Herodotou H. Octopus FS: A distributed file system with tiered storage management. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.65-78.
[17]
Dong B, Byna S, Wu K S, Prabhat, Johansen H, Johnson J N, Keen N. Data elevator: Low-contention data movement in hierarchical storage system. In Proc. the 23rd IEEE International Conference on High Performance Computing, December 2016, pp.152-161.
[18]
Lim S H, Sim H, Gunasekaran R, Vazhkudai S S. Scientific user behavior and data-sharing trends in a petascale file system. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 46.
[19]
Sim H, Kim Y, Vazhkudai S S, Vallée G R, Lim S H, Butt A R. Tagit: An integrated indexing and search service for file systems. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 5.
[20]

Jenkins J, Arkatkar I, Lakshminarasimhan S, Boyuka-II D A, Schendel E R, Shah N, Ethier S, Chang C S, Chen J, Kolla H, Klasky S, Ross R B, Samatova N F. ALACRITY: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. Trans. Large-Scale Dataand Knowledge-Centered Systems, 2013, 10: 95-114.

[21]
Lu T, Suchyta E, Pugmire D, Choi J, Klasky S, Liu Q, Podhorszki N, Ainsworth M, Wolf M. Canopus: A paradigm shift towards elastic extreme-scale data analytics on HPC storage. In Proc. the 2017 IEEE International Conference on Cluster Computing, September 2017, pp.58-69.
[22]
Foster I T, Ainsworth M, Allen B et al. Computing just what you need: Online data analysis and reduction at extreme scales. In Proc. the 23rd International Conference on Parallel and Distributed Computing, August 2017, pp.3-19.
[23]

Liao X K, Xiao L Q, Yang C Q, Lu Y T. MilkyWay-2 supercomputer: System and application. Frontiers Comput. Sci., 2014, 8(3): 345-356.

[24]

Xu W X, Lu Y T, Li Q et al. Hybrid hierarchy storage system in MilkyWay-2 supercomputer. Frontiers Comput. Sci., 2014, 8(3): 367-377.

[25]
Li H B, Cheng P, Chen Z G, Xiao N. Pream: Enhancing HPC storage system performance with pre-allocated metadata management mechanism. In Proc. the 21st IEEE International Conference on High Performance Computing and Communications, August 2019, pp.413-420.
[26]
Cheng P, Lu Y T, Du Y F, Chen Z G. Accelerating scientific workflows with tiered data management system. In Proc. the 20th IEEE International Conference on High Performance Computing and Communications, June 2018, pp.75-82.
[27]
Kougkas A, Devarajan H, Sun X H. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on High-Performance Parallel and Distributed Computing, June 2018, pp.219-230.
[28]
Wang T, Byna S, Dong B, Tang H J. UniviStor: Integrated hierarchical and distributed storage for HPC. In Proc. IEEE International Conference on Cluster Computing, September 2018, pp.134-144.
[29]
Dong B, Wang T, Tang H J, Koziol Q, Wu K S, Byna S. ARCHIE: Data analysis acceleration with array caching in hierarchical storage. In Proc. the 2018 IEEE International Conference on Big Data, December 2018, pp.211-220.
[30]
Feng K, Sun X H, Yang X, Zhou S J. SciDP: Support HPC and big data applications via integrated scientific data processing. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.114-123.
[31]
Wasi-ur-Rahman M, Lu X Y, Islam N S, Rajachandrasekar R, Panda D K. High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA. In Proc. the 2015 IEEE International Parallel and Distributed Processing Symposium, May 2015, pp.291-300.
[32]
Pumma S, Si M, Feng W C, Balaji P. Parallel I/O optimizations for scalable deep learning. In Proc. the 23rd IEEE International Conference on Parallel and Distributed Systems, December 2017, pp.720-729.
[33]
Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R B, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In Proc. the ACM International Conference on Multimedia, November 2014, pp.675-678.
[34]

Tomes E, Rush E N, Altiparmak N. Towards adaptive parallel storage systems. IEEE Trans. Computers, 2018, 67(12): 1840-1848.

[35]
He S B, Sun X H, Wang Y, Xu C Z. A migratory heterogeneity-aware data layout scheme for parallel file systems. In Proc. the 2018 IEEE International Parallel and Distributed Processing Symposium, May 2018, pp.1133-1142.
[36]
Subedi P, Davis P E, Duan S H, Klasky S, Kolla H, Parashar M. Stacker: An autonomic data movement engine for extreme-scale data staging-based in-situ workflows. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 73.
[37]
Wu K, Ren J, Li D. Runtime data management on nonvolatile memory-based heterogeneous memory for taskparallel programs. In Proc. the International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 31.
[38]

Stonebraker M, Brown P, Zhang D H, Becla J. SciDB: A database management system for applications with complex analytics. Computing in Science and Engineering, 2013, 15(3): 54-62.

[39]
Dong B, Wu K S, Byna S, Liu J L, Zhao W J, Rusu F. ArrayUDF: User-defined scientific data analysis on arrays. In Proc. the 26th International Symposium on High-Performance Parallel and Distributed Computing, June 2017, pp.53-64.
[40]
Chou J, Howison M, Austin B, Wu K S, Qiang J, Bethel E W, Shoshani A, Rübel O, Prabhat, Ryne R D. Parallel index and query for large scale data analysis. In Proc. the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011, Article No. 30.
[41]
Chiu H T, Chou J, Vishwanath V, Wu K S. In-memory query system for scientific dataseis. In Proc. the 21st IEEE International Conference on Parallel and Distributed Systems, December 2015, pp.362-371.
[42]
Dong B, Byna S, Wu K S. Spatially clustered join on heterogeneous scientific data sets. In Proc. the 2015 IEEE International Conference on Big Data, October 29–November 1, 2015, pp.371-380.
[43]
Gu J M, Klasky S, Podhorszki N, Qiang J, Wu K S. Querying large scientific data sets with adaptable IO system ADIOS. In Proc. the 4th Asian Conference on Supercomputing Frontiers, March 2018, pp.51-69.
[44]
Wu T H, Chou J, Hao S, Dong B, Klasky S, Wu K S. Optimizing the query performance of block index through data analysis and I/O modeling. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 12.
[45]
Kim J, Abbasi H, Chacón L, Docan C, Klasky S, Liu Q, Podhorszki N, Shoshani A, Wu K S. Parallel in situ indexing for data-intensive computing. In Proc. the IEEE Symposium on Large Data Analysis and Visualization, October 2011, pp.65-72.
[46]
Liu N, Cope J, Carns P H et al. On the role of burst buffers in leadership-class storage systems. In Proc. the 28th IEEE Symposium on Mass Storage Systems and Technologies, April 2012, Article No. 5.
[47]
Lee J Y, Lee J H. Pre-allocated duplicate name prefix detection mechanism using naming-pool in mobile contentcentric network. In Proc. the 7th International Conference on Ubiquitous and Future Networks, July 2015, pp.115-117.
[48]
Pagh R, Rodler F F. Cuckoo hashing. In Proc. the 9th Annual European Symposium, August 2001, pp.121-133.
[49]
Phillips D. A directory index for EXT2. In Proc. the 5th Annual Linux Showcase & Conference, November 2001.
[50]
Sweeney A, Doucette D, Hu W, Anderson C, Nishimoto M, Peck G. Scalability in the XFS file system. In Proc. the 1996 USENIX Annual Technical Conference, January 1996, pp.1-14.
[51]
Lensing P H, Cortes T, Brinkmann A. Direct lookup and hash-based metadata placement for local file systems. In Proc. the 6th Annual International Systems and Storage Conference, July 2013, Article No. 5.
[52]
Lensing P, Meister D, Brinkmann A. hashFS: Applying hashing to optimize file systems for small file reads. In Proc. the 2010 International Workshop on Storage Network Architecture and Parallel I/Os, May 2010, pp.33-42.
[53]
Mathur A, Cao M M, Bhattacharya S, Dilger A, Tomas A, Vivier L. The new ext4 filesystem: Current status and future plans. In Proc. the 2007 Linux Symposium, June 2007, pp.21-33.
[54]
Shibata T, Choi S J, Taura K. File-access characteristics of data-intensive workflow applications. In Proc. the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, May 2010, pp.522-525.
[55]
Katz D S, Armstrong T G, Zhang Z, Wilde M, Wozniak J M. Many-task computing and blue waters. arXiv: 1202.3943, 2012. https://arxiv.org/abs/1202.3943, Oct. 2019.
[56]
Yoo A B, Jette M A, Grondona M. SLURM: Simple Linux utility for resource management. In Proc. the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, June 2003, pp.44-60.
[57]

Wu K S, Ahern S, Bethel E W et al. FastBit: Interactively searching massive data. Journal of Physics: Conference Series, 2009, 180(1): Article No. 012053.

[58]
Cheng P, Wang Y, Lu Y T, Du Y F, Chen Z G. IndexIt: Enhancing data locating services for parallel file systems. In Proc. the 21st IEEE International Conference on High Performance Computing and Communications, August 2019, pp.1011-1019.
[59]
Wu T H, Chou J, Podhorszki N, Gu J M, Tian Y, Klasky S, Wu K S. Apply block index technique to scientific data analysis and I/O systems. In Proc. the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2017, pp.865-871.
[60]

Chen D H, Xue J S, Yang X S et al. New generation of multi-scale NWP system (GRAPES): General scientific design. Chinese Science Bulletin, 2008, 53(22): 3433-3445.

[61]

Bush W S, Moore J H. Chapter 11: Genome-wide association studies. PLoS Computational Biology, 2012, 8(12): Article No. e1002822.

[62]
Chaimov N, Malony A D, Canon S, Iancu C, Ibrahim K Z, Srinivasan J. Scaling spark on HPC systems. In Proc. the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, May 2016, pp.97-110.
[63]
Taft R, Vartak M, Satish N R, Sundaram N, Madden S, Stonebraker M. GenBase: A complex analytics genomics benchmark. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.177-188.
[64]
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255.
[65]

Deelman E, Gannon D, Shields M S, Taylor I J. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Comp. Syst., 2009, 25(5): 528-540.

[66]
Berriman B G, Good J C, Laity A C et al. Chapter 19: Web-based Tools — Montage: An astronomical image mosaic engine. In The National Virtual Observatory: Tools and Techniques for Astronomical Aesearch, Graham M J, Fitzpatrick M J, McGlynn T A (eds.), Astronomical Society of the Pacific, 2007, pp.179-189.
[67]

Hazekamp N, Kremer-Herman N, Tovar B et al. Combining static and dynamic storage management for data intensive scientific workflows. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(2): 338-350.

Journal of Computer Science and Technology
Pages 27-46
Cite this article:
Lu Y-T, Cheng P, Chen Z-G. Design and Implementation of the Tianhe-2 Data Storage and Management System. Journal of Computer Science and Technology, 2020, 35(1): 27-46. https://doi.org/10.1007/s11390-020-9799-4

327

Views

4

Crossref

N/A

Web of Science

4

Scopus

1

CSCD

Altmetrics

Received: 15 July 2019
Revised: 14 October 2019
Published: 17 January 2020
©Institute of Computing Technology, Chinese Academy of Sciences 2020
Return