Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage

Dominic Davies-Tagg; Ashiq Anjum; Ali Zahir; Lu Liu; Muhammad Usman Yaseen; Nick Antonopoulos

doi:10.26599/BDMA.2023.9020039

| Sign up

PDF (8.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage

Dominic Davies-Tagg^¹, Ashiq Anjum^², Ali Zahir^²(), Lu Liu^², Muhammad Usman Yaseen^³, Nick Antonopoulos^⁴

1Department of Computing, University of Derby, Derby, DE22 1GB, UK

2Department of Informatics, University of Leicester, Leicester, LE1 7RH, UK

3Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan

4Edinburgh Napier University, Edinburgh, EH11 4BN, UK

Show Author Information

Abstract

Data temperature is a response to the ever-growing amount of data. These data have to be stored, but they have been observed that only a small portion of the data are accessed more frequently at any one time. This leads to the concept of hot and cold data. Cold data can be migrated away from high-performance nodes to free up performance for higher priority data. Existing studies classify hot and cold data primarily on the basis of data age and usage frequency. We present this as a limitation in the current implementation of data temperature. This is due to the fact that age automatically assumes that all new data have priority and that usage is purely reactive. We propose new variables and conditions that influence smarter decision-making on what are hot or cold data and allow greater user control over data location and their movement. We identify new metadata variables and user-defined variables to extend the current data temperature value. We further establish rules and conditions for limiting unnecessary movement of the data, which helps to prevent wasted input output (I/O) costs. We also propose a hybrid algorithm that combines existing variables and new variables and conditions into a single data temperature. The proposed system provides higher accuracy, increases performance, and gives greater user control for optimal positioning of data within multi-tiered storage solutions.

Keywords

data temperature hot and cold data multi-tiered storage metadata variable multi-temperature system

References

[1]

J. M. Tien, Big data: Unleashing information, J. Syst. Sci. Syst. Eng., vol. 22, no. 2, pp. 127–151, 2013.

Crossref Google Scholar

[2]

T. R. Gregory, Synergy between sequence and size in large-scale genomics, Nat. Rev. Genet., vol. 6, no. 9, pp. 699–708, 2005.

Crossref Google Scholar

[3]

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling, in Proc. 5th European Conf. Computer Systems - EuroSys ’10, Paris, France, 2010, pp. 1–14.

Crossref

[4]

V. Roussev, G. Richard, and D. Tingstrom, dRamDisk: Efficient RAM sharing on a commodity cluster, in Proc. IEEE Int. Performance Computing and Communications Conf., Phoenix, AZ, USA, 2006, pp. 193–198.

[5]

J. Guerra, H. Pucha, J. Glider, W. Belluomini, and R. Rangaswami, Cost effective storage using extent based dynamic tiering, in Proc. 9th USENIX Conf. File and Stroage Technologies (FAST’11), San Jose, CA, USA, 2011, pp. 1–14.

[6]

D. Basin, E. Bortnikov, A. Braginsky, G. Golan-Gueta, E. Hillel, I. Keidar, and M. Sulamy, KiWi: A key-value map for scalable real-time analytics, ACM Trans. Parallel Comput., vol. 7, no. 3, p. 16, 2020.

Crossref Google Scholar

[7]

V. Sundaram, T. Wood, and P. Shenoy, Efficient data migration in self-managing storage systems, in Proc. IEEE Int. Conf. Autonomic Computing, Dublin, Ireland, 2006, pp. 297–300.

[8]

G. Zhang, L. Chiu, and L. Liu, Adaptive data migration in multi-tiered storage based cloud environment, in Proc. IEEE 3rd Int. Conf. Cloud Computing, Miami, FL, USA, 2010, pp. 148–155.

Crossref

[9]

J. Zhang, M. Ma, W. He, and P. Wang, On-demand deployment for IoT applications, J. Syst. Archit., vol. 111, p. 101794, 2020.

Crossref Google Scholar

[10]

P. Gupta and M. Pegah, A new thought paradigm: Delivering cost effective and ubiquitously accessible storage with enterprise backup system via a multi-tiered storage framework, in Proc. 35th Annual ACM SIGUCCS Fall Conf., Orlando, FL, USA, 2007, pp. 146–152.

Crossref

[11]

R. Buyya, S. K. Garg, and R. N. Calheiros, SLA-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions, in Proc. Int. Conf. Cloud and Service Computing, Hong Kong, China, 2011, pp. 1–10.

Crossref

[12]

J. Tai, B. Sheng, Y. Yao, and N. Mi, Live data migration for reducing SLA violations in multi-tiered storage systems, in Proc. IEEE Int. Conf. Cloud Engineering, Boston, MA, USA, 2014, pp. 361–366.

Crossref

[13]

R. Rizzi and D. Cariolaro, Polynomial time complexity of edge colouring graphs with bounded colour classes, Algorithmica, vol. 69, no. 3, pp. 494–500, 2014.

Crossref Google Scholar

[14]

I. Robertson-Steel, Evolution of triage systems, Emerg. Med. J., vol. 23, no. 2, pp. 154–155, 2006.

Crossref Google Scholar

[15]

M. Christ, F. Grossmann, D. Winter, R. Bingisser, and E. Platz, Modern triage in the emergency department, https://www.aerzteblatt.de/int/archive/article/79788, 2010.

Crossref

[16]

M. Yamada and S. Yamaguchi, Filesystem layout reorganization in virtualized environment, in Proc. 9th Int. Conf. Ubiquitous Intelligence and Computing and 9th Int. Conf. Autonomic and Trusted Computing, Fukuoka, Japan, 2012, pp. 501–508.

Crossref

[17]

Z. Yang, Y. Wang, J. Bhamini, C. C. Tan, and N. Mi, EAD: Elasticity aware deduplication manager for datacenters with multi-tier storage systems, Clust. Comput., vol. 21, no. 3, pp. 1561–1579, 2018.

Crossref Google Scholar

[18]

G. Zhang, L. Chiu, C. Dickey, L. Liu, P. Muench, and S. Seshadri, Automated lookahead data migration in SSD-enabled multi-tiered storage systems, in Proc. IEEE 26th Symp. on Mass Storage Systems and Technologies (MSST), Incline Village, NV, USA, 2010, pp. 1–6.

Crossref

[19]

W. Shin, C. D. Brumgard, B. Xie, S. S. Vazhkudai, D. Ghoshal, S. Oral, and L. Ramakrishnan, Data jockey: Automatic data management for HPC multi-tiered storage systems, in Proc. IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 511–522.

Crossref

[20]

S. Sankar and K. Vaid, Storage characterization for unstructured data in online services applications, in Proc. IEEE Int. Symp. on Workload Characterization (IISWC), Austin, TX, USA, 2009, pp. 148–157.

Crossref

[21]

S. Rawson, M. G. Iadanza, N. A. Ranson, and S. P. Muench, Methods to account for movement and flexibility in cryo-EM data processing, Methods, vol. 100, pp. 35–41, 2016.

Crossref Google Scholar

[22]

A. Kala Karun and K. Chitharanjan, A review on Hadoop—HDFS infrastructure extensions, in Proc. IEEE Conf. Information & Communication Technologies, Thuckalay, India, 2013, pp. 132–137.

Crossref

[23]

P. K. Acharya and S. K. Patro, Effect of lime and ferrochrome ash (FA) as partial replacement of cement on strength, ultrasonic pulse velocity and permeability of concrete, Constr. Build. Mater., vol. 94, pp. 448–457, 2015.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 7 Issue 2,
June 2024

Pages 371-398

DOI: 10.26599/BDMA.2023.9020039

Cite this article:

Davies-Tagg D, Anjum A, Zahir A, et al. Data Temperature Informed Streaming for Optimising Large-Scale Multi-Tiered Storage. Big Data Mining and Analytics, 2024, 7(2): 371-398. https://doi.org/10.26599/BDMA.2023.9020039