AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Lawrence Berkeley National Laboratory, Berkeley, CA 94597, U.S.A.
The HDF Group, Champaign, IL 61820, U.S.A.
Argonne National Laboratory, Lemont, IL 60439, U.S.A.
Show Author Information

Abstract

Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.

Electronic Supplementary Material

Download File(s)
jcst-35-1-145-Highlights.pdf (1.1 MB)

References

[1]
Folk M, Heber G, Koziol Q, Pourmal E, Robinson D. An overview of the HDF5 technology suite and its applications. In Proc. the 2011 EDBT/ICDT Workshop on Array Databases, March 2011, pp.36-47.
[2]
Li J W, Liao W K, Choudhary A N et al. Parallel netCDF: A high-performance scientific I/O interface. In Proc. the 2003 ACM/IEEE Conference on Supercomputing, November 2003, Article No. 39.
[3]
Lofstead J, Zheng F, Klasky S, Schwan K. Adaptable, metadata rich IO methods for portable high performance IO. In Proc. the 23rd IEEE International Symposium on Parallel Distributed Processing, May 2009, Article No. 44.
[4]
Dong B, Byna S, Wu K S et al. Data elevator: Lowcontention data movement in hierarchical storage system. In Proc. the 23rd IEEE International Conference on High Performance Computing, December 2016, pp.152-161.
[5]
Dong B, Wang T, Tang H, Koziol Q, Wu K, Byna S. ARCHIE: Data analysis acceleration with array caching in hierarchical storage. In Proc. the 2018 IEEE International Conference on Big Data, December 2018, pp.211-220.
[6]

Seo S, Amer A, Balaji P et al. Argobots: A lightweight lowlevel threading and tasking framework. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(3): 512-526.

[7]

Wu K. FastBit: An efficient indexing technology for accelerating data-intensive science. Journal of Physics: Conference Series, 2005, 16(16): 556-560.

[8]
Racah E, Beckham C, Maharaj T, Kahou S E, Prabhat, Pal C. ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. In Proc. the 31st Annual Conference on Neural Information Processing Systems, December 2017, pp.3402-3413.
[9]
Byna S, Chou J C Y, Rübel O et al. Parallel I/O, analysis, and visualization of a trillion particle simulation. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2012, Article No. 59.
[10]

Chen J H, Choudhary A, de Supinski B et al. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science & Discovery, 2009, 2(1).

[11]
Dong B, Wu K S, Byna S, Liu J L, Zhao W J, Rusu F. ArrayUDF: User-defined scientific data analysis on arrays. In Proc. the 26th International Symposium on High-Performance Parallel and Distributed Computing, June 2017, pp.53-64.
Journal of Computer Science and Technology
Pages 145-160
Cite this article:
Byna S, Breitenfeld MS, Dong B, et al. ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems. Journal of Computer Science and Technology, 2020, 35(1): 145-160. https://doi.org/10.1007/s11390-020-9822-9

361

Views

30

Crossref

N/A

Web of Science

32

Scopus

0

CSCD

Altmetrics

Received: 06 July 2019
Revised: 28 August 2019
Published: 17 January 2020
©Institute of Computing Technology, Chinese Academy of Sciences 2020
Return