AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems

Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA 70803, U.S.A.
Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education of China, Beijing 100872, China
School of Information, Renmin University of China, Beijing 100872, China
Shelby Center for Engineering Technology, Department of Computer Science and Software Engineering Samuel Ginn College of Engineering, Auburn University, Auburn, AL 36849-5347, U.S.A.

A preliminary version of the paper was published in the Proceedings of MSST 2014.

Show Author Information

Abstract

Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) read cache can be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs’ lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution — LOP-Cache — to greatly improve the write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its performance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs’ lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.

Electronic Supplementary Material

Download File(s)
jcst-33-1-58-Highlights.pdf (396.5 KB)

References

[1]
EMC Corporation. The EMC digital universe study. Technical Report, 2014. https://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf, Jan. 2018.
[2]
Gantz J, Reinsel D. The digital universe decade—Are you ready? Technical Report, IDC-IVIEW EMC Corporation, 2010. http://www.group47.com/The_Digital_Universe_Decade-Are_You_Ready.pdf, Dec. 2017
[3]
Ganesan P. Read performance enhancement in data deduplication for secondary storage [M.S. Theses]. University of Minnesota, Minnesota, USA, 2013.
[4]
Alvarez C. NetApp deduplication for FAS and V-Series deployment and implementation guide. Technical Report TR-3505, NetApp, Inc., 2011. http://www.concordeitgroup.com/docs/netapp/netapp—deduplication—deployment-guide.pdf, February 2011.
[5]
EMC. Achieving storage efficiency through EMC Celerra data deduplication: Applied technology. EMC White Paper, http://www.docin.com/p-688598633.html, March 2010.
[6]
Mao B, Jiang H, Wu S Z, Fu Y J, Tian L. SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In Proc. the 7th IEEE Int. Conf. Networking Architecture and Storage, June 2012, pp.328-337.
[7]
Rabin M O. Fingerprinting by random polynomials. Technical Report TR-15-81, Department of Mathematics, The Hebrew University of Jerusalem, and Department of Computer Science, Harvard University, 1981. http://www.cs.cmu.edu/~15-749/READINGS/optional/rabin1981.pdf, Dec. 2017.
[8]
Zhu B, Li K, Patterson H. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. the 6th USENIX Conf. File and Storage Technologies, February 2008, Article No. 18.
[9]
Srinivasan K, Bisson T, Goodson G, Voruganti K. iDedup: Latency-aware, inline data deduplication for primary storage. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.
[10]
Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezis G, Camble P. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proc. the 7th Conf. File and Storage Technologies, February 2009, pp.111-123.
[11]
Xia W, Jiang H, Feng D, Hua Y. SiLo: A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In Proc. USENIX Annual Technical Conf., June 2011, pp.26-28.
[12]
Nam Y J, Park D, Du D H C. Assuring demanded read performance of data deduplication storage with backup datasets. In Proc. the 20th Int. Symp. Modeling Analysis and Simulation of Computer and Telecommunication Systems, August 2012, pp.201-208.
[13]
Meister D, Brinkmann A. dedupv1: Improving deduplication throughput using solid state drives (SSD). In Proc. the 26th Symp. Mass. Storage Systems and Technologies, May 2010.
[14]
Debnath B, Sudipta S, Li J. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proc. USENIX Annual Technical Conf., June 2010.
[15]
Boboila S, Desnoyers P. Write endurance in flash drives: Measurements and analysis. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.
[16]
Grupp L M, Davis J D, Swanson S. The bleak future of NAND flash memory. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012.
[17]
Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T. Extending SSD lifetimes with disk-based write caches. In Proc. the 8th USENIX Conf. File and Storage Technologies, February 2010.
[18]
Chen Z G, Liu F, Du Y M. Reorder the write sequence by virtual write buffer to extend SSD’s lifespan. In Proc. the 8th IFIP Int. Conf. Network and Parallel Computing, October 2011, pp.263-276.
[19]
Yang Q, Ren J. I-CASH: Intelligently coupled array of SSD and HDD. In Proc. the 17th Int. Symp. High Performance Computer Architecture, February 2011, pp.278-289.
[20]
Kim J, Son I, Choi J, Yoon S, Kang S, Won Y, Cha J. Deduplication in SSD for reducing write amplification factor. In Proc. the 9th USENIX Conf. File and Storage Technologies, Feb. 2011.
[21]
Jeong J, Hahn S S, Lee S, Kim J. Lifetime improvement of NAND flash-based storage systems using dynamic program and erase scaling. In Proc. the 12th USENIX Conf. File and Storage Technologies, February 2014, pp.61-74.
[22]
Zhang L K, Neely B, Franklin D, Strukov D, Xie Y, Chong F T. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.519-531.
[23]
Zhang M Z, Zhang L K, Jiang L, Liu Z Y, Chong F T. Balancing performance and lifetime of MLC PCM by using a region retention monitor. In Proc. IEEE. Int. Symp. High Performance Computer Architecture, February 2017, pp.385-396
[24]
Jiang S, Zhang X D. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, June 2002, pp.31-42.
[25]
Megiddo N, Modha D S. ARC: A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conf. File and Storage Technologies, March 2003.
[26]
Huang S, Wei Q S, Chen J X, Chen C, Feng D. Improving flash-based disk cache with lazy adaptive replacement. In Proc. the 29th Symp. Mass Storage Systems and Technologies, May 2013.
[27]

Matthews J, Trika S, Hensgen D, Coulson R, Grimsrud K. Intel® turbo memory: Nonvolatile disk caches in the storage hierarchy of mainstream computer systems. ACM Trans. Storage (TOS), 2008, 4(2): Article No. 4.

[28]
Pritchett T, Thottethodi M. SieveStore: A highly-selective, ensemble-level disk cache for cost-performance. In Proc. the 37th Annual Int. Symp. Computer Architecture, June 2010, pp.163-174.
[29]

Qureshi M K, Jaleel A, Patt Y N, Steely S C, Emer J. Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 2007, 35(2): 381-391

[30]
Qureshi M K, Suleman M A, Patt Y N. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In Proc. the 13th IEEE Int. Symp. High Performance Computer Architecture, February 2007, pp.250-259.
[31]
Liu J, Chai Y P, Qin X, Xiao Y. PLC-cache: Endurable SSD cache for deduplication-based primary storage. In Proc. the 30th Symp. Mass Storage Systems and Technologies, June 2014.
[32]
Wang L, Zhan J F, Luo C J, Zhu Y Q, Yang Q, He Y Q, Gao W L, Jia Z, Shi Y J, Zhang S J, Zheng C, Lu G, Zhan K, Li X N, Qiu B Z. BigDataBench: A big data benchmark suite from Internet services. In Proc. the 20th IEEE Int. Symp. High Performance Computer Architecture, February 2014, pp.488-499.
[33]
Fu M. An experimental platform for chunk-level data deduplication. https://github.com/fomy/destor, Dec. 2017.
[34]
Lillibridge M, Eshghi K, Bhagwat D. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proc. the 11th USENIX Conf. File and Storage Technologies, February 2013, pp.183-197.
Journal of Computer Science and Technology
Pages 58-78
Cite this article:
Liu J, Chai Y-P, Qin X, et al. Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems. Journal of Computer Science and Technology, 2018, 33(1): 58-78. https://doi.org/10.1007/s11390-018-1808-5

354

Views

10

Crossref

N/A

Web of Science

11

Scopus

0

CSCD

Altmetrics

Received: 06 December 2016
Revised: 07 May 2017
Published: 26 January 2018
©2018 LLC & Science Press, China
Return