AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (3.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data

Department of Computer Engineering and Application, GLA University, Mathura 281406, India
College of Computer Sciences and Information Technology, King Faisal University, Hofuf 31982, Saudi Arabia
School of Computing, Graphic Era Hill University, Dehradun 248002, India
Show Author Information

Abstract

We live in an age where everything around us is being created. Data generation rates are so scary, creating pressure to implement costly and straightforward data storage and recovery processes. MapReduce model functionality is used for creating a cluster parallel, distributed algorithm, and large datasets. The MapReduce strategy from Hadoop helps develop a community of non-commercial use to offer a new algorithm for resolving such problems for commercial applications as expected from this working algorithm with insights as a result of disproportionate or discriminatory Hadoop cluster results. Expected results are obtained in the work and the exam conducted under this job; many of them are scheduled to set schedules, match matrices’ data positions, clustering before determining to click, and accurate mapping and internal reliability to be closed together to avoid running and execution times. Mapper output and proponents have been implemented, and the map has been used to reduce the function. The execution input key/value pair and output key/value pair have been set. This paper focuses on evaluating this technique for the efficient retrieval of large volumes of data. The technique allows for capabilities to inform a massive database of information, from storage and indexing techniques to the distribution of queries, scalability, and performance in heterogeneous environments. The results show that the proposed work reduces the data processing time by 30%.

References

[1]
M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and K. Sadatdiynov, A survey of data partitioning and sampling methods to support big data analysis, Big Data Mining and Analytics, vol. 3, no. 2, pp. 85101, 2020.
[2]
M. D. Li, H. Z. Wang, and J. Z. Li, Mining conditional functional dependency rules on big data, Big Data Mining and Analytics, vol. 3, no. 1, pp. 6884, 2020.
[3]
S. Salloum, J. Z. Huang, and Y. L. He, Random sample partition: A distributed data model for big data analysis, IEEE Trans. Industr. Inform., vol. 15, no. 11, pp. 58465854, 2019.
[4]
R. H. Lin, Z. Z. Ye, H. Wang, and B. D. Wu, Chronic diseases and health monitoring big data: A survey, IEEE Rev. Biomed. Eng., vol. 11, pp. 275288, 2018.
[5]
Y. N. Tang, H. X. Guo, T. T. Yuan, Q. Wu, X. Li, C. Wang, X. Gao, and J. Wu, OEHadoop: Accelerate Hadoop applications by co-designing Hadoop with data center network, IEEE Access, vol. 6, pp. 2584925860, 2018.
[6]
X. C. Hua, M. C. Huang, and P. Liu, Hadoop configuration tuning with ensemble modeling and metaheuristic optimization, IEEE Access, vol. 6, pp. 4416144174, 2018.
[7]
D. Z. Cheng, X. B. Zhou, P. Lama, M. K. Ji, and C. J. Jiang, Energy efficiency aware task assignment with DVFS in heterogeneous Hadoop clusters, IEEE Trans. Parallel Distrib. Syst., vol. 29, no. 1, pp. 7082, 2018.
[8]
A. Kumar, A. Kumar, A. K. Bashir, M. Rashid, V. D. A. Kumar, and R. Kharel, Distance based pattern driven mining for outlier detection in high dimensional big dataset, ACM Trans. Manag. Inf. Syst., vol. 13, no. 1, pp. 117, 2022.
[9]
A. Khaleel and H. Al-Raweshidy, Optimization of computing and networking resources of a Hadoop cluster based on software defined network, IEEE Access, vol. 6, pp. 6135161365, 2018.
[10]
M. Malik, K. Neshatpour, S. Rafatirad, and H. Homayoun, Hadoop workloads characterization for performance and energy efficiency optimizations on microservers, IEEE Trans. Multi-Scale Comput. Syst., vol. 4, no. 3, pp. 355368, 2018.
[11]
Y. Yao, J. Y. Wang, B. Sheng, C. C. Tan, and N. F. Mi, Self-adjusting slot configurations for homogeneous and heterogeneous Hadoop clusters, IEEE Trans. Cloud Comput., vol. 5, no. 2, pp. 344357, 2017.
[12]
H. Alshammari, J. Lee, and H. Bajwa, H2Hadoop: Improving Hadoop performance using the metadata of related jobs, IEEE Trans. Cloud Comput., vol. 6, no. 4, pp. 10311040, 2018.
[13]
I. Ullah, M. S. Khan, M. Amir, J. Kim, and S. M. Kim, LSTPD: Least slack time-based preemptive deadline constraint scheduler for Hadoop clusters, IEEE Access, vol. 8, pp. 111751111762, 2020.
[14]
R. R. Parmar, S. Roy, D. Bhattacharyya, S. K. Bandyopadhyay, and T. H. Kim, Large-scale encryption in the Hadoop environment: Challenges and solutions, IEEE Access, vol. 5, pp. 71567163, 2017.
[15]
S. Kumar and M. Singh, A novel clustering technique for efficient clustering of big data in Hadoop ecosystem, Big Data Mining and Analytics, vol. 2, no. 4, pp. 240247, 2019.
[16]
W. Huang, L. K. Meng, D. Y. Zhang, and W. Zhang, In-memory parallel processing of massive remotely sensed data using an Apache spark on Hadoop YARN model, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 10, no. 1, pp. 319, 2017.
[17]
M. Soualhia, F. Khomh, and S. Tahar, A dynamic and failure-aware task scheduling framework for Hadoop, IEEE Trans. Cloud Comput., vol. 8, no. 2, pp. 553569, 2020.
[18]
D. Tao, Z. W. Lin, and B. X. Wang, Load feedback-based resource scheduling and dynamic migration-based data locality for virtual Hadoop clusters in OpenStack-based clouds, Tsinghua Science and Technology, vol. 22, no. 2, pp. 149159, 2017.
[19]
P. Qin, B. Dai, B. X. Huang, and G. Xu, Bandwidth-aware scheduling with SDN in Hadoop: A new trend for big data, IEEE Syst. J., vol. 11, no. 4, pp. 23372344, 2017.
[20]
X. Y. Wang, M. Veeraraghavan, and H. Y. Shen, Evaluation study of a proposed Hadoop for data center networks incorporating optical circuit switches, J. Opt. Commun. Netw., vol. 10, no. 8, pp. C50C63, 2018.
[21]
Y. Q. Chen, Y. Zhou, S. Taneja, X. Qin, and J. Z. Huang, aHDFS: An erasure-coded data archival system for Hadoop clusters, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 11, pp. 30603073, 2017.
[22]
Z. Z. Li, H. Y. Shen, W. Ligon, and J. Denton, An exploration of designing a hybrid scale-up/out Hadoop architecture based on performance measurements, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 2, pp. 386400, 2017.
[23]
H. F. Wang and Y. P. Cao, An energy efficiency optimization and control model for Hadoop clusters, IEEE Access, vol. 7, pp. 4053440549, 2019.
[24]
N. M. F. Qureshi, D. R. Shin, I. F. Siddiqui, and B. S. Chowdhry, Storage-tag-aware scheduler for Hadoop cluster, IEEE Access, vol. 5, pp. 1374213755, 2017.
[25]
Z. Z. Li and H. Y. Shen, Measuring scale-up and scale-out Hadoop with remote and local file systems and selecting the best platform, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 11, pp. 32013214, 2017.
[26]
Y. P. Zheng and G. Y. Chen, Energy analysis and application of data mining algorithms for internet of things based on Hadoop cloud platform, IEEE Access, vol. 7, pp. 183195183206, 2019.
[27]
C. T. Chen, L. J. Hung, S. Y. Hsieh, R. Buyya, and A. Y. Zomaya, Heterogeneous job allocation scheduler for Hadoop mapreduce using dynamic grouping integrated neighboring search, IEEE Trans. Cloud Comput., vol. 8, no. 1, pp. 193206, 2020.
[28]
P. Q. Jin, X. J. Hao, X. L. Wang, and L. H. Yue, Energy-efficient task scheduling for CPU-intensive streaming jobs on Hadoop, IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 6, pp. 12981311, 2019.
[29]
K. Sridharan, G. Komarasamy, and S. Daniel Madan Raja, Hadoop framework for efficient sentiment classification using trees, IET Netw., vol. 9, no. 5, pp. 223228, 2020.
[30]
Z. C. Dou, I. Khalil, A. Khreishah, and A. Al-Fuqaha, Robust insider attacks countermeasure for Hadoop: Design and implementation, IEEE Syst. J., vol. 12, no. 2, pp. 18741885, 2018.
[31]
R. Agarwal, A. S. Jalal, and K. V. Arya, Local binary hexagonal extrema pattern (LBH XEP): A new feature descriptor for fake iris detection, Vis. Comput., vol. 37, no. 6, pp. 13571368, 2021.
[32]
R. Agarwal, A. S. Jalal, and K. V. Arya, Enhanced binary hexagonal extrema pattern (EBH XEP) descriptor for iris liveness detection, Wirel. Pers. Commun., vol. 115, no. 3, pp. 26272643, 2020.
[33]
R. Agarwal, A. S. Jalal, and K. V. Arya, A multimodal liveness detection using statistical texture features and spatial analysis, Multimed. Tools Appl., vol. 79, no. 19, pp. 1362113645, 2020.
[34]
R. Agrawal, A. S. Jalal, and K. V. Arya, Fake fingerprint liveness detection based on micro and macro features, Int. J. Biom., vol. 11, no. 2, pp. 177206, 2019.
Big Data Mining and Analytics
Pages 465-477
Cite this article:
Kumar A, Varshney N, Bhatiya S, et al. Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data. Big Data Mining and Analytics, 2023, 6(4): 465-477. https://doi.org/10.26599/BDMA.2022.9020026

521

Views

68

Downloads

2

Crossref

3

Web of Science

4

Scopus

0

CSCD

Altmetrics

Received: 26 April 2022
Revised: 07 July 2022
Accepted: 14 July 2022
Published: 29 August 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return