AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

GekkoFS — A Temporary Burst Buffer File System for HPC Applications

Zentrum für Datenverarbeitung, Johannes Gutenberg University Mainz, Mainz 55128, Germany
Barcelona Supercomputing Center, Barcelona 08034, Spain
Computer Architecture Department, Universitat Politècnica de Catalunya, Barcelona 08034, Spain
Show Author Information

Abstract

Many scientific fields increasingly use high-performance computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today’s HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel file system without interfering with it. However, burst buffer file systems typically offer many features that a scientific application, running in isolation for a limited amount of time, does not require. We present GekkoFS, a temporary, highly-scalable file system which has been specifically optimized for the aforementioned use cases. GekkoFS provides relaxed POSIX semantics which only offers features which are actually required by most (not all) applications. GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.

Electronic Supplementary Material

Download File(s)
jcst-35-1-72-Highlights.pdf (648.9 KB)

References

[1]

Hey T, Tansley S, Tolle K M. The Fourth Paradigm: Data-Intensive Scientific Discovery (1st edition). Microsoft Research, 2009.

[2]

Ross R, Thakur R, Choudhary A. Achievements and challenges for I/O in computational science. Journal of Physics: Conference Series, 2005, 16(1): 501-509.

[3]

Nieuwejaar N, Kotz D, Purakayastha A, Ellis C S, Best M L. File-access characteristics of parallel scientific workloads. IEEE Trans. Parallel Distrib. Syst., 1996, 7(10): 1075-1089.

[4]
Wang F, Xin Q, Hong B, Brandt S A, Miller E, Long D, McLarty T. File system workload analysis for large scientific computing applications. In Proc. the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies, April 2004, pp.139-152.
[5]
Crandall P, Aydt R A, Chien A A, Reed D A. Input/output characteristics of scalable parallel applications. In Proc. the 1995 Supercomputing, December 1995, Article No. 59.
[6]
Dorier M, Antoniu G, Ross R B, Kimpe D, Ibrahim S. CALCioM: Mitigating I/O interference in HPC systems through cross-application coordination. In Proc. the 28th IEEE International Parallel and Distributed Processing Symposium, May 2014, pp.155-164.
[7]
Thapaliya S, Bangalore P, Lofstead J F, Mohror K, Moody A. Managing I/O interference in a shared burst buffer system. In Proc. the 45th International Conference on Parallel Processing, August 2016, pp.416-425.
[8]
Lofstead J F, Klasky S, Schwan K, Podhorszki N, Jin C. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In Proc. the 6th International Workshop on Challenges of Large Applications in Distributed Environments, June 2008, pp.15-24.
[9]
Folk M, Cheng A, Yates K. HDF5: A file format and I/O library for high performance computing applications. In Proc. the 1999 Supercomputing (CD-ROM), November 1999, pp.5-33.
[10]
Liu N, Cope J, Carns P H, Carothers C D, Ross R B, Grider G, Crume A, Maltzahn C. On the role of burst buffers in leadership-class storage systems. In Proc. the 28th IEEE Symposium on Mass Storage Systems and Technologies, April 2012, Article No. 5.
[11]
Wang T, Mohror K, Moody A, Sato K, YuW. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 International Conference for High Performance Computing, November 2016, pp.807-818.
[12]
Bent J, Gibson G A, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M, Wingate M. PLFS: A checkpoint filesystem for parallel applications. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing, November 2009, Article No. 26.
[13]
Vilayannur M, Nath P, Sivasubramaniam A. Providing tunable consistency for a parallel file store. In Proc. the 2005 Conference on File and Storage Technologies, December 2005, Article No. 3.
[14]
Lensing P H, Cortes T, Hughes J, Brinkmann A. File system scalability with highly decentralized metadata on independent storage devices. In Proc. the 16th the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2016, pp.366-375.
[15]
Soumagne J, Kimpe D, Zounmevo J A, Chaarawi M, Koziol Q, Afsahi A, Ross R B. Mercury: Enabling remote procedure call for high-performance computing. In Proc. the 2013 IEEE International Conference on Cluster Computing, September 2013, Article No. 50.
[16]

Seo S, Amer A, Balaji P, Bordage C et al. Argobots: A lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst., 2018, 29(3): 512-526.

[17]
Carns P H, Jenkins J, Cranor C D, Atchley S, Seo S, Snyder S, Ross R B. Enabling NVM for data-intensive scientific services. In Proc. the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, November 2016, Article No. 4.
[18]
Jasak H, Jemcov A, Tukovic Z et al. OpenFOAM: A C++ library for complex physics simulations. In Proc. the International Workshop on Coupled Methods in Numerical Dynamics, September 2007, Article No. 3.
[19]
Vef M, Moti N, Süß T, Tocci T, Nou R, Miranda A, Cortes T, Brinkmann A. GekkoFS — A temporary distributed file system for HPC applications. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.319-324.
[20]
Schmuck F B, Haskin R L. GPFS: A shared-disk file system for large computing clusters. In Proc. the 2002 Conference on File and Storage Technologies, January 2002, pp.231-244.
[21]
Braam P J, Schwan P. Lustre: The intergalactic file system. In Proc. the 2002 Ottawa Linux Symposium, June 2002, pp.50-54.
[22]
Qian Y, Li X, Ihara S, Zeng L, Kaiser J, Süß T, Brinkmann A. A configurable rule based classful token bucket filter network request scheduler for the Lustre file system. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 6.
[23]
Herold F, Breuner S. An introduction to BeeGFS. https://www.beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf, August 2019.
[24]
Ross R B, Latham R. PVFS — PVFS: A parallel file system. In Proc. the 2006 ACM/IEEE Conference on High Performance Networking and Computing, November 2006, Article No. 34.
[26]
Kougkas A, Devarajan H, Sun X. Hermes: A heterogeneousaware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on High-Performance Parallel and Distributed Computing, June 2018, pp.219-230.
[27]
Latham R, Ross R B, Thakur R. The impact of file systems on MPI-IO scalability. In Proc. the 11th European PVM/MPI Users’ Group Meeting, September 2004, pp.87-96.
[28]

Choudhary A, Liao W K, Gao K, Nisar A, Ross R, Thakur R, Latham R. Scalable I/O and analytics. Journal of Physics: Conference Series, 2009, 180(1): Article No. 012048.

[29]
Moore M, Bonnie D, Ligon B, Marshall M, Ligon W, Mills N, Quarles E, Sampson S, Yang S, Wilson B. OrangeFS: Advancing PVFS. https://www.usenix.org/legacy/event/fast11/posters_files/Moore.pdf, August 2019.
[30]

Ritchie D, Thompson K. The UNIX time-sharing system (reprint). Commun. ACM, 1983, 26(1): 84-89.

[31]

Vef M A, Tarasov V, Hildebrand D, Brinkmann A. Challenges and solutions for tracing storage systems: A case study with spectrum scale. ACM Trans. Storage, 2018, 14(2): Article No. 18.

[32]
Patil S, Gibson G A. Scale and concurrency of GIGA+: File system directories with millions of files. In Proc. the 9th USENIX Conference on File and Storage Technologies, February 2011, pp.177-190.
[33]
Ren K, Zheng Q, Patil S, Gibson G A. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proc. the 2014 International Conference for High Performance Computing, November 2014, pp.237-248.
[34]
Carns P, Yao Y, Harms K, Latham R, Ross R, Antypas K. Production I/O characterization on the Cray XE6. In Proc. the Cray User Group Meeting, May 2013, Article No. 121.
[35]
Xing J, Xiong J, Sun N, Ma J. Adaptive and scalable metadata management to support a trillion files. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing, November 2009, Article No. 31.
[36]
FringsW, Wolf F, Petkov V. Scalable massively parallel I/O to task-local files. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing, November 2009, Article No. 22.
[37]
Yang S, Ligon III W B, Quarles E C. Scalable distributed directory implementation on orange file system. In Proc. the 7th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, May 2011.
[38]
Patil S, Ren K, Gibson G. A case for scaling HPC metadata performance through de-specialization. In Proc. the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, November 2012, pp.30-35.
[39]
Carns P H, Ligon III W B, Ross R B, Thakur R. PVFS: A parallel file system for Linux clusters. In Proc. the 4th Annual Linux Showcase & Conference, October 2000, Article No. 4.
[40]
Dong S, Callaghan M, Galanis L, Borthakur D, Savor T, Strum M. Optimizing space amplification in RocksDB. In Proc. the 8th Biennial Conference on Innovative Data Systems Research, January 2017, Article No. 30.
[41]
Oral S, Dillow D A, Fuller D et al. OLCF’s 1 Tb/s, nextgeneration Lustre file system. In Proc. the 2013 Cray User Group Conference, May 2013, Article No. 151.
[42]
Lofstead J F, Zheng F, Liu Q, Klasky S, Oldfield R, Kordenbrock T, Schwan K, Wolf M. Managing variability in the IO performance of petascale storage systems. In Proc. the 2010 Conference on High Performance Computing Networking, Storage and Analysis, November 2010, Article No. 35.
[43]
Xie B, Chase J S, Dillow D, Drokin O, Klasky S, Oral S, Podhorszki N. Characterizing output bottlenecks in a supercomputer. In Proc. the 2012 International Conference on High Performance Computing Networking, Storage and Analysis, November 2012, Article No. 8.
[44]
Kougkas A, Devarajan H, Sun X, Lofstead J F. Harmonia: An interference-aware dynamic I/O scheduler for shared non-volatile burst buffers. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.290-301.
[45]
Hashimoto Y, Aida K. Evaluation of performance degradation in HPC applications with VM consolidation. In Proc. the 3rd International Conference on Networking and Computing, December 2012, pp.273-277.
[46]
Lofstead J F, Ross R. Insights for exascale IO APIs from building a petascale IO API. In Proc. the 2013 International Conference for High Performance Computing, November 2013, Article No. 87.
[47]

Reed D A, Dongarra J J. Exascale computing and big data. Commun. ACM, 2015, 58(7): 56-68.

Journal of Computer Science and Technology
Pages 72-91
Cite this article:
Vef M-A, Moti N, Süß T, et al. GekkoFS — A Temporary Burst Buffer File System for HPC Applications. Journal of Computer Science and Technology, 2020, 35(1): 72-91. https://doi.org/10.1007/s11390-020-9797-6

468

Views

30

Crossref

N/A

Web of Science

31

Scopus

0

CSCD

Altmetrics

Received: 30 June 2019
Revised: 03 October 2019
Published: 17 January 2020
©Institute of Computing Technology, Chinese Academy of Sciences 2020
Return