AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Survey

A Survey of Non-Volatile Main Memory File Systems

State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of SciencesBeijing 100190, China
Research Center for Advanced Computer Systems, Institute of ComputingTechnology, Chinese Academy of SciencesBeijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
Show Author Information

Abstract

Non-volatile memories (NVMs) provide lower latency and higher bandwidth than block devices. Besides, NVMs are byte-addressable and provide persistence that can be used as memory-level storage devices (non-volatile main memory, NVMM). These features change storage hierarchy and allow CPU to access persistent data using load/store instructions. Thus, we can directly build a file system on NVMM. However, traditional file systems are designed based on slow block devices. They use a deep and complex software stack to optimize file system performance. This design results in software overhead being the dominant factor affecting NVMM file systems. Besides, scalability, crash consistency, data protection, and cross-media storage should be reconsidered in NVMM file systems. We survey existing work on optimizing NVMM file systems. First, we analyze the problems when directly using traditional file systems on NVMM, including heavy software overhead, limited scalability, inappropriate consistency guarantee techniques, etc. Second, we summarize the technique of 30 typical NVMM file systems and analyze their advantages and disadvantages. Finally, we provide a few suggestions for designing a high-performance NVMM file system based on real hardware Optane DC persistent memory module. Specifically, we suggest applying various techniques to reduce software overheads, improving the scalability of virtual file system (VFS), adopting highly-concurrent data structures (e.g., lock and index), using memory protection keys (MPK) for data protection, and carefully designing data placement/migration for cross-media file system.

Electronic Supplementary Material

Video
JCST-2010-11054-Vidoe.mp4
Download File(s)
JCST-2010-11054-Highlights.pdf (146.1 KB)

References

[1]
Akel A, Caulfield A M, Mollov T I, Gupta R K, Swanson S. Onyx: A prototype phase change memory storage array. In Proc. the 3rd USENIX Conference on Hot Topics in Storage and File Systems, Jun. 2011. DOI: 10.5555/2002218.2002220
[2]
Baek I G, Lee M S, Seo S, Lee M J, Seo D H, Suh D S, Park J C, Park S O, Kim H S, Yoo I K, Chung U I, Moon J T. Highly scalable nonvolatile resistive memory using simple binary oxide driven by asymmetric unipolar voltage pulses. In Proc. the 2004 IEEE International Electron Devices Meeting, Dec. 2004, pp.587–590. DOI: 10.1109/IEDM.2004.1419228.
[3]

Kawahara T. Scalable spin-transfer torque RAM technology for normally-off computing. IEEE Design & Test of Computers, 2011, 28(1): 52–63. DOI: 10.1109/MDT.2010.97.

[4]

Raoux S, Burr G W, Breitwisch M J, Rettner C T, Chen Y C, Shelby R M, Salinga M, Krebs D, Chen S H, Lung H L, Lam C H. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development, 2008, 52(4/5): 465–479. DOI: 10.1147/rd.524.0465.

[5]
Dulloor S R, Kumar S, Keshavamurthy A, Lantz P, Reddy D, Sankaran R, Jackson J. System software for persistent memory. In Proc. the 9th European Conference on Computer Systems, Apr. 2014, Article No. 15. DOI: 10.1145/2592798.2592814.
[6]
Wu X J, Reddy A L N. SCMFS: A file system for storage class memory. In Proc. the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2011, Article No. 39. DOI: 10.1145/2501620.2501621
[7]
Mathur A, Cao M M, Bhattacharya S, Dilger A, Tomas A, Vivier L. The new ext4 filesystem: Current status and future plans. In Proc. the 2007 Linux Symposium, Jun. 2007, pp.21–34.
[8]
Sweeney A, Doucette D, Hu W, Anderson C, Nishimoto M, Peck G. Scalability in the XFS file system. In Proc. the USENIX 1996 Annual Technical Conference, Jan. 1996. DOI: 10.5555/1268299.1268300.
[9]

Rodeh O, Bacik J, Mason C. BTRFS: The Linux B-tree filesystem. ACM Trans. Storage, 2013, 9(3): Article No. 9. DOI: 10.1145/2501620.2501623.

[10]
Lee C, Sim D, Hwang J Y, Cho S. F2FS: A new file system for flash storage. In Proc. the 13th USENIX Conference on File and Storage Technologies, Feb. 2015, pp.273–286. DOI: 10.5555/2750482.2750503.
[11]
Campello D, Lopez H, Useche L, Koller R, Rangaswami R. Non-blocking writes to files. In Proc. the 13th USENIX Conference on File and Storage Technologies, Feb. 2015, pp.151–165. DOI: 10.5555/2750482.2750494.
[12]
Chidambaram V, Sharma T, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Consistency without ordering. In Proc. the 10th USENIX Conference on File and Storage Technologies, Feb. 2012. DOI: 10.5555/2208461.2208470.
[13]
Jannen W, Yuan J, Zhan Y et al. BetrFS: A right-optimized write-optimized file system. In Proc. the 13th USENIX Conference on File and Storage Technologies, Feb. 2015, pp.301–315. DOI: 10.5555/2750482.2750505.
[14]
Yuan J, Zhan Y, Jannen W, Pandey P, Akshintala A, Chandnani K, Deo P, Kasheff Z, Walsh L, Bender M A, Farach-Colton M, Johnson R, Kuszmaul B C, Porter D E. Optimizing every operation in a write-optimized file system. In Proc. the 14th USENIX Conference on File and Storage Technologies, Feb. 2016. DOI: 10.5555/2930583.2930584.
[15]
Zhan Y, Conway A, Jiao Y Z, Knorr E, Bender M A, Farach-Colton M, Jannen W, Johnson R, Porter D E, Yuan J. The full path to full-path indexing. In Proc. the 16th USENIX Conference on File and Storage Technologies, Feb. 2018, pp.123–138. DOI: 10.5555/3189759.3189771.
[16]
Izraelevitz J, Yang J, Zhang L, Kim J, Liu X, Memaripour A, Soh Y J, Wang Z X, Xu Y, Dulloor S R, Zhao J S, Swanson S. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv: 1903.05714, 2019. https://arxiv.org/abs/1903.05714, Mar. 2023.
[17]
Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology. In Proc. the 36th Annual International Symposium on Computer Architecture, Jun. 2009, pp.24–33. DOI: 10.1145/1555754.1555760.
[18]
Xu J, Swanson S. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proc. the 14th USENIX Conference on File and Storage Technologies, Feb. 2016, pp.323–338. DOI: 10.5555/2930583.2930608.
[19]
Kang J B, Zhang B L, Wo T, Hu C M, Huai J P. MultiLanes: Providing virtualized storage for OS-level virtualization on many cores. In Proc. the 12th USENIX Conference on File and Storage Technologies, Feb. 2014, pp.317–329. DOI: 10.1145/2801155.
[20]
Kang J B, Zhang B L, Wo T, Yu W R, Du L, Ma S, Huai J P. SpanFS: A scalable file system on fast storage devices. In Proc. the USENIX 2015 Annual Technical Conference, Jul. 2015, pp.249–261. DOI: 10.5555/2813767.2813786.
[21]
Lu L Y, Zhang Y P, Do T, Al-Kiswany S, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Physical disentanglement in a container-based file system. In Proc. the 11th USENIX Conference on Operating Systems Design and Implementation, Oct. 2014, pp.81–96. DOI: 10.5555/2685048.2685056.
[22]

Psaroudakis I, Scheuer T, May N et al. Scaling up concurrent main-memory column-store scans: Towards adaptive NUMA-aware data and task placement. Proceedings of the VLDB Endowment, 2015, 8(12): 1442–1453. DOI: 10.14778/2824032.2824043.

[23]
Kwon Y, Fingler H, Hunt T, Peter S, Witchel E. Strata: A cross media file system. In Proc. the 26th Symposium on Operating Systems Principles, Oct. 2017, pp.460–477. DOI: 10.1145/3132747.3132770.
[24]
Bhat S S, Eqbal R, Clements A T, Kaashoek M F. Scaling a file system to many cores using an operation log. In Proc. the 26th Symposium on Operating Systems Principles, Oct. 2017, pp.69–86. DOI: 10.1145/3132747.3132779.
[25]

Rosenblum M, Ousterhout J K. The design and implementation of a log-structured file system. ACM Trans. Computer Systems, 1992, 10(1): 26–52. DOI: 10.1145/146941.146943.

[26]

Chang H S, Chang Y H, Hsiu P C, Kuo T W, Li H P. Marching-based wear-leveling for PCM-based storage systems. ACM Trans. Design Automation of Electronic Systems, 2015, 20(2): Article No. 25. DOI: 10.1145/2699831.

[27]
Yang C S, Liu D, Zhang R Y, Chen X Z, Nie S, Wang F S, Zhuge Q F, Sha E H M. Efficient multi-grained wear leveling for inodes of persistent memory file systems. In Proc. the 57th ACM/IEEE Design Automation Conference, Jul. 2020. DOI: 10.1109/DAC18072.2020.9218626.
[28]
Zheng S A, Hoseinzadeh M, Swanson S. Ziggurat: A tiered file system for non-volatile main memories and disks. In Proc. the 17th USENIX Conference on File and Storage Technologies, Feb. 2019, pp.207–219. DOI: 10.5555/3323298.3323318.
[29]

Wu C W, Zhang G Y, Li K Q. Rethinking computer architectures and software systems for phase-change memory. ACM Journal on Emerging Technologies in Computing Systems, 2016, 12(4): Article No. 33. DOI: 10.1145/2893186.

[30]

Chen A. A review of emerging non-volatile memory (NVM) technologies and applications. Solid-State Electronics, 2016, 125: 25–38. DOI: 10.1016/j.sse.2016.07.006.

[31]

Mittal S, Vetter J S. A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans. Parallel and Distributed Systems, 2016, 27(5): 1537–1550. DOI: 10.1109/TPDS.2015.2442980.

[32]

Puglia G O, Zorzo A F, De Rose C A F, Perez T, Milojicic D. Non-volatile memory file systems: A survey. IEEE Access, 2019, 7: 25836–25871. DOI: 10.1109/ACCESS.2019.2899463.

[33]
Lee B C, Ipek E, Mutlu O, Burger D. Architecting phase change memory as a scalable dram alternative. In Proc. the 36th Annual International Symposium on Computer Architecture, Jun. 2009, pp.2–13. DOI: 10.1145/1555754.1555758.
[34]
Chang M F, Wu J J, Chien T F, Liu Y C, Yang T C, Shen W C, King Y C, Lin C J, Lin K F, Chih Y D, Natarajan S, Chang J. 19.4 embedded 1mb ReRAM in 28nm CMOS with 0.27-to-1v read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In Proc. the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers, Feb. 2014, pp.332–333. DOI: 10.1109/ISSCC.2014.6757457.
[35]
Chen R H, Shao Z L, Liu D, Feng Z Y, Li T. Towards efficient NVDIMM-based heterogeneous storage hierarchy management for big data workloads. In Proc. the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2019, pp.849–860. DOI: 10.1145/3352460.3358266.
[36]
Yang J, Wei Q S, Chen C, Wang C D, Yong K L, He B S. NV-tree: Reducing consistency cost for NVM-based single level systems. In Proc. the 13th USENIX Conference on File and Storage Technologies, Feb. 2015, pp.167–181. DOI: 10.5555/2750482.2750495.
[37]
Condit J, Nightingale E B, Frost C, Ipek E, Lee B, Burger D, Coetzee D. Better I/O through byte-addressable, persistent memory. In Proc. the 22nd ACM SIGOPS Symposium on Operating Systems Principles, Oct. 2009, pp.133–146. DOI: 10.1145/1629575.1629589.
[38]
Ou J X, Shu J W, Lu Y Y. A high performance file system for non-volatile main memory. In Proc. the 11th European Conference on Computer Systems, Apr. 2016, Article No. 12. DOI: 10.1145/2901318.2901324.
[39]
Xu J, Zhang L, Memaripour A, Gangadharaiah A, Borase A, Da Silva T B, Swanson S, Rudoff A. NOVA-Fortis: A fault-tolerant non-volatile main memory file system. In Proc. the 26th Symposium on Operating Systems Principles, Oct. 2017, pp.478–496. DOI: 10.1145/3132747.3132761.
[40]
Volos H, Magalhaes G, Cherkasova L, Li J. Quartz: A lightweight performance emulator for persistent memory software. In Proc. the 16th Annual Middleware Conference, Nov. 2015, pp.37–49. DOI: 10.1145/2814576.2814806.
[41]
Yang J, Kim J, Hoseinzadeh M, Izraelevitz J, Swanson S. An empirical guide to the behavior and use of scalable persistent memory. In Proc. the 18th USENIX Conference on File and Storage Technologies, Feb. 2020, pp.169–182. DOI: 10.5555/3386691.3386708.
[42]
Jeong D, Lee Y, Kim J S. Boosting quasi-asynchronous I/O for better responsiveness in mobile devices. In Proc. the 13th USENIX Conference on File and Storage Technologies, Feb. 2015, pp.191–202. DOI: 10.5555/2750482.2750497.
[43]
Harter T, Dragga C, Vaughn M, Arpaci-Dusseau A C, Arpaci-Dusseau R H. A file is not a file: Understanding the I/O behavior of Apple desktop applications. In Proc. the 23rd ACM Symposium on Operating Systems Principles, Oct. 2011, pp.71–83. DOI: 10.1145/2043556.2043564.
[44]
Lee G, Shin S, Song W, Ham T J, Lee J W, Jeong J. Asynchronous I/O stack: A low-latency kernel I/O stack for ultra-low latency SSDs. In Proc. the USENIX 2019 Annual Technical Conference, Jul. 2019, pp.603–616. DOI: 10.5555/3358807.3358858.
[45]
Wang Y, Jiang D, Xiong J. Caching or not: Rethinking virtual file system for non-volatile main memory. In Proc. the 10th USENIX Workshop on Hot Topics in Storage and File Systems, Jul. 2018.
[46]
Zhou D, Pan W, Xie T et al. A file system bypassing volatile main memory: Towards a single-level persistent store. In Proc. the 15th ACM International Conference on Computing Frontiers, May 2018, pp.97–104. DOI: 10.1145/3203217.3203277.
[47]
Wang Y, Jiang D J, Xiong J. Revisiting virtual file system for metadata optimized non-volatile main memory file system. In Proc. the 36th International Conference on Massive Storage Systems and Technology, Oct. 2020.
[48]
Dong M K, Bu H, Yi J F et al. Performance and protection in the ZoFS user-space NVM file system. In Proc. the 27th ACM Symposium on Operating Systems Principles, Oct. 2019, pp.478–493. DOI: 10.1145/3341301.3359637.
[49]
Sha E H M, Jia Y, Chen X Z, Zhuge Q F, Jiang W W, Qin J J. The design and implementation of an efficient user-space in-memory file system. In Proc. the 5th Non-Volatile Memory Systems and Applications Symposium, Aug. 2016. DOI: 10.1109/NVMSA.2016.7547176.
[50]
Kadekodi R, Lee S K, Kashyap S, Kim T, Kolli A, Chidambaram V. SplitFS: Reducing software overhead in file systems for persistent memory. In Proc. the 27th ACM Symposium on Operating Systems Principles, Oct. 2019, pp.494–508. DOI: 10.1145/3341301.3359631.
[51]
Kannan S, Arpaci-Dusseau A C, Arpaci-Dusseau R H, Wang Y G, Xu J, Palani G. Designing a true direct-access file system with DevFS. In Proc. the 16th USENIX Conference on File and Storage Technologies, Feb. 2018, pp.241–255. DOI: 10.5555/3189759.3189782.
[52]
Volos H, Nalli S, Panneerselvam S et al. Aerie: Flexible file-system interfaces to storage-class memory. In Proc. the 9th European Conference on Computer Systems, Apr. 2014, Article No. 14. DOI: 10.1145/2592798.2592810.
[53]
Yoshimura T, Chiba T, Horii H. EvFS: User-level, event-driven file system for non-volatile memory. In Proc. the 11th USENIX Conference on Hot Topics in Storage and File Systems, Jul. 2019. DOI: 10.5555/3357062.3357083.
[54]

Chen S M, Jin Q. Persistent B+-trees in non-volatile main memory. Proceedings of the VLDB Endowment, 2015, 8(7): 786–797. DOI: 10.14778/2752939.2752947.

[55]
Oukid I, Lasperas J, Nica A, Willhalm T. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proc. the 2016 International Conference on Management of Data, Jun. 2016, pp.371–386. DOI: 10.1145/2882903.2915251.
[56]
Sha E H M, Chen X Z, Zhuge Q F, Shi L, Jiang W W. A new design of in-memory file system based on file virtual address framework. IEEE Trans. Computers, 2016, 65(10): 2959–2972. DOI: 10.1109/TC.2016.2516019.
[57]
Dong M K, Chen H B. Soft updates made simple and fast on non-volatile memory. In Proc. the USENIX 2017 Annual Technical Conference, Jul. 2017, pp.719–731. DOI: 10.5555/3154690.3154758.
[58]
Chen J X, Wei Q S, Chen C, Wu L K. FSMAC: A file system metadata accelerator with non-volatile memory. In Proc. the 29th Symposium on Mass Storage Systems and Technologies, May 2013. DOI: 10.1109/MSST.2013.6558440.
[59]
Qiu S, Reddy A L N. NVMFS: A hybrid file system for improving random write in nand-flash SSD. In Proc. the 29th Symposium on Mass Storage Systems and Technologies, May 2013. DOI: 10.1109/MSST.2013.6558434.
[60]

Huang T C, Chang D W. TridentFS: A hybrid file system for non-volatile RAM, flash memory and magnetic disk. Software Practice and Experience, 2016, 46(3): 291–318. DOI: 10.1002/spe.2299.

[61]
Lee E, Yoo S, Jang J E, Bahn H. Shortcut-JFS: A write efficient journaling file system for phase change memory. In Proc. the 28th Symposium on Mass Storage Systems and Technologies, Apr. 2012. DOI: 10.1109/MSST.2012.6232378.
[62]
Weiss Z, Arpaci-Dusseau A C, Arpaci-Dusseau R H. DenseFS: A cache-compact filesystem. In Proc. the 10th USENIX Conference on Hot Topics in Storage and File Systems, Jul. 2018. DOI: 10.5555/3277332.3277334.
[63]
Kim J H, Kim J, Kang H, Lee C G, Park S, Kim Y. pNOVA: Optimizing shared file I/O operations of NVM file system on manycore servers. In Proc. the 10th ACM SIGOPS Asia-Pacific Workshop on Systems, Aug. 2019. DOI: 10.1145/3343737.3343748.
[64]
Zheng S A, Huang L P, Liu H, Wu L Z, Zha J. HMVFS: A hybrid memory versioning file system. In Proc. the 32nd Symposium on Mass Storage Systems and Technologies, May 2016. DOI: 10.1109/MSST.2016.7897079.
[65]
Lee S K, Lim K H, Song H, Nam B, Noh S H. WORT: Write optimal radix tree for persistent memory storage systems. In Proc. the 15th USENIX Conference on File and Storage Technologies, Feb. 2017, pp.257–270. DOI: 10.5555/3129633.3129657.
[66]
Min C, Kashyap S, Maass S, Kang W, Kim T. Understanding manycore scalability of file systems. In Proc. the USENIX 2016 Annual Technical Conference, Jun. 2016, pp.71–85. DOI: 10.5555/3026959.3026967.
[67]
Phillips D. A directory index for ext2. In Proc. the 5th Annual Linux Showcase & Conference, Nov. 2001. DOI: 10.5555/1268488.1268508.
[68]

Rodeh O. B-trees, shadowing, and clones. ACM Trans. Storage, 2008, 3(4): Article No. 2. DOI: 10.1145/1326542.1326544.

[69]
Seltzer M, Bostic K, Mckusick M K, Staelin C. An implementation of a log-structured file system for UNIX. In Proc. the USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 Conference Proceedings, Jan. 1993. DOI: 10.5555/1267303.1267306.
[70]

Ganger G R, McKusick M K, Soules C A N, Patt Y N. Soft updates: A solution to the metadata update problem in file systems. ACM Trans. Computer Systems, 2000, 18(2): 127–153. DOI: 10.1145/350853.350863.

[71]
McKusick M K, Ganger G R. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proc. the USENIX 1999 Annual Technical Conference, Jun. 1999. DOI: 10.5555/1268708.1268732.
[72]

Dragga C, Santry D J. GCTrees: Garbage collecting snapshots. ACM Trans. Storage, 2016, 12(1): Article No. 4. DOI: 10.1145/2857056.

[73]
Miller E L, Brandt S A, Long D D E. Hermes: High-performance reliable MRAM-enabled storage. In Proc. the 8th Workshop on Hot Topics in Operating Systems, May 2001, pp.95–99. DOI: 10.1109/HOTOS.2001.990067.
[74]

Wang A I A, Kuenning G, Reiher P, Popek G. The conquest file system: Better performance through a disk/persistent-RAM hybrid design. ACM Trans. Storage, 2006, 2(3): 309–348. DOI: 10.1145/1168910.1168914.

[75]
Park S, Lee S, Xu W, Moon H, Kim T. Libmpk: Software abstraction for Intel memory protection keys (Intel MPK). In Proc. the USENIX 2019 Annual Technical Conference, Jul. 2019, pp.241–254. DOI: 10.5555/3358807.3358829.
[76]

Chen Y M, Lu Y Y, Chen P, Shu J W. Efficient and consistent NVMM cache for SSD-based file system. IEEE Trans. Computers, 2019, 68(8): 1147–1158. DOI: 10.1109/TC.2018.2870137.

[77]

Yang C S, Zhuge Q F, Chen X Z, Sha E H M, Liu D, Zhang R Y. Optimizing synchronization mechanism for block-based file systems using persistent memory. Future Generation Computer Systems, 2020, 111: 288–299. DOI: 10.1016/j.future.2020.04.024.

[78]

Chen C, Yang J, Wei Q S, Wang C D, Xue M D. Optimizing file systems with fine-grained metadata journaling on byte-addressable NVM. ACM Trans. Storage, 2017, 13(2): Article No. 13. DOI: 10.1145/3060147.

[79]
Chen C, Yang J, Wei Q S, Wang C D, Xue M D. Fine-grained metadata journaling on NVM. In Proc. the 32nd Symposium on Mass Storage Systems and Technologies, May 2016. DOI: 10.1109/MSST.2016.7897077.
[80]

Matsui C, Sun C, Takeuchi K. Design of hybrid SSDs with storage class memory and NAND flash memory. Proceedings of the IEEE, 2017, 105(9): 1812–1821. DOI: 10.1109/JPROC.2017.2716958.

Journal of Computer Science and Technology
Pages 348-372
Cite this article:
Wang Y, Jia W-Q, Jiang D-J, et al. A Survey of Non-Volatile Main Memory File Systems. Journal of Computer Science and Technology, 2023, 38(2): 348-372. https://doi.org/10.1007/s11390-023-1054-3

478

Views

1

Crossref

0

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 09 October 2020
Accepted: 12 March 2023
Published: 30 March 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return