| Sign up

PDF (3.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data

Ankit Kumar^¹, Neeraj Varshney^¹, Surbhi Bhatiya^², Kamred Udham Singh^³()

1 Department of Computer Engineering and Application, GLA University, Mathura 281406, India

2 College of Computer Sciences and Information Technology, King Faisal University, Hofuf 31982, Saudi Arabia

3 School of Computing, Graphic Era Hill University, Dehradun 248002, India

Show Author Information

Abstract

We live in an age where everything around us is being created. Data generation rates are so scary, creating pressure to implement costly and straightforward data storage and recovery processes. MapReduce model functionality is used for creating a cluster parallel, distributed algorithm, and large datasets. The MapReduce strategy from Hadoop helps develop a community of non-commercial use to offer a new algorithm for resolving such problems for commercial applications as expected from this working algorithm with insights as a result of disproportionate or discriminatory Hadoop cluster results. Expected results are obtained in the work and the exam conducted under this job; many of them are scheduled to set schedules, match matrices’ data positions, clustering before determining to click, and accurate mapping and internal reliability to be closed together to avoid running and execution times. Mapper output and proponents have been implemented, and the map has been used to reduce the function. The execution input key/value pair and output key/value pair have been set. This paper focuses on evaluating this technique for the efficient retrieval of large volumes of data. The technique allows for capabilities to inform a massive database of information, from storage and indexing techniques to the distribution of queries, scalability, and performance in heterogeneous environments. The results show that the proposed work reduces the data processing time by 30%.

Keywords

big data hadoop mapreduce resource allocation query management

References

[1]

M. S.

Mahmud

, J. Z.

Huang

, S.

Salloum

, T. Z.

Emara

, and K.

Sadatdiynov

, A survey of data partitioning and sampling methods to support big data analysis, Big Data Mining and Analytics, vol. 3, no. 2, pp. 85–101, 2020.

Crossref Google Scholar

[2]

M. D.

Li

, H. Z.

Wang

, and J. Z.

Li

, Mining conditional functional dependency rules on big data, Big Data Mining and Analytics, vol. 3, no. 1, pp. 68–84, 2020.

Crossref Google Scholar

[3]

S.

Salloum

, J. Z.

Huang

, and Y. L.

He

, Random sample partition: A distributed data model for big data analysis, IEEE Trans. Industr. Inform., vol. 15, no. 11, pp. 5846–5854, 2019.

Crossref Google Scholar

[4]

R. H.

Lin

, Z. Z.

Ye

, H.

Wang

, and B. D.

Wu

, Chronic diseases and health monitoring big data: A survey, IEEE Rev. Biomed. Eng., vol. 11, pp. 275–288, 2018.

Crossref Google Scholar

[5]

Y. N.

Tang

, H. X.

Guo

, T. T.

Yuan

, Q.

Wu

, X.

Li

, C.

Wang

, X.

Gao

, and J.

Wu

, OEHadoop: Accelerate Hadoop applications by co-designing Hadoop with data center network, IEEE Access, vol. 6, pp. 25849–25860, 2018.

Crossref Google Scholar

[6]

X. C.

Hua

, M. C.

Huang

, and P.

Liu

, Hadoop configuration tuning with ensemble modeling and metaheuristic optimization, IEEE Access, vol. 6, pp. 44161–44174, 2018.

Crossref Google Scholar

[7]

D. Z.

Cheng

, X. B.

Zhou

, P.

Lama

, M. K.

Ji

, and C. J.

Jiang

, Energy efficiency aware task assignment with DVFS in heterogeneous Hadoop clusters, IEEE Trans. Parallel Distrib. Syst., vol. 29, no. 1, pp. 70–82, 2018.

Crossref Google Scholar

[8]

A.

Kumar

, A.

Kumar

, A. K.

Bashir

, M.

Rashid

, V. D. A.

Kumar

, and R.

Kharel

, Distance based pattern driven mining for outlier detection in high dimensional big dataset, ACM Trans. Manag. Inf. Syst., vol. 13, no. 1, pp. 1–17, 2022.

Crossref Google Scholar

[9]

A.

Khaleel

and H.

Al-Raweshidy

, Optimization of computing and networking resources of a Hadoop cluster based on software defined network, IEEE Access, vol. 6, pp. 61351–61365, 2018.

Crossref Google Scholar

[10]

M.

Malik

, K.

Neshatpour

, S.

Rafatirad

, and H.

Homayoun

, Hadoop workloads characterization for performance and energy efficiency optimizations on microservers, IEEE Trans. Multi-Scale Comput. Syst., vol. 4, no. 3, pp. 355–368, 2018.

Crossref Google Scholar

[11]

Y.

Yao

, J. Y.

Wang

, B.

Sheng

, C. C.

Tan

, and N. F.

Mi

, Self-adjusting slot configurations for homogeneous and heterogeneous Hadoop clusters, IEEE Trans. Cloud Comput., vol. 5, no. 2, pp. 344–357, 2017.

Crossref Google Scholar

[12]

H.

Alshammari

, J.

Lee

, and H.

Bajwa

, H2Hadoop: Improving Hadoop performance using the metadata of related jobs, IEEE Trans. Cloud Comput., vol. 6, no. 4, pp. 1031–1040, 2018.

Crossref Google Scholar

[13]

I.

Ullah

, M. S.

Khan

, M.

Amir

, J.

Kim

, and S. M.

Kim

, LSTPD: Least slack time-based preemptive deadline constraint scheduler for Hadoop clusters, IEEE Access, vol. 8, pp. 111751–111762, 2020.

Crossref Google Scholar

[14]

R. R.

Parmar

, S.

Roy

, D.

Bhattacharyya

, S. K.

Bandyopadhyay

, and T. H.

Kim

, Large-scale encryption in the Hadoop environment: Challenges and solutions, IEEE Access, vol. 5, pp. 7156–7163, 2017.

Crossref Google Scholar

[15]

S.

Kumar

and M.

Singh

, A novel clustering technique for efficient clustering of big data in Hadoop ecosystem, Big Data Mining and Analytics, vol. 2, no. 4, pp. 240–247, 2019.

Crossref Google Scholar

[16]

W.

Huang

, L. K.

Meng

, D. Y.

Zhang

, and W.

Zhang

, In-memory parallel processing of massive remotely sensed data using an Apache spark on Hadoop YARN model, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 10, no. 1, pp. 3–19, 2017.

Crossref Google Scholar

[17]

M.

Soualhia

, F.

Khomh

, and S.

Tahar

, A dynamic and failure-aware task scheduling framework for Hadoop, IEEE Trans. Cloud Comput., vol. 8, no. 2, pp. 553–569, 2020.

Crossref Google Scholar

[18]

D.

Tao

, Z. W.

Lin

, and B. X.

Wang

, Load feedback-based resource scheduling and dynamic migration-based data locality for virtual Hadoop clusters in OpenStack-based clouds, Tsinghua Science and Technology, vol. 22, no. 2, pp. 149–159, 2017.

Crossref Google Scholar

[19]

P.

Qin

, B.

Dai

, B. X.

Huang

, and G.

Xu

, Bandwidth-aware scheduling with SDN in Hadoop: A new trend for big data, IEEE Syst. J., vol. 11, no. 4, pp. 2337–2344, 2017.

Crossref Google Scholar

[20]

X. Y.

Wang

, M.

Veeraraghavan

, and H. Y.

Shen

, Evaluation study of a proposed Hadoop for data center networks incorporating optical circuit switches, J. Opt. Commun. Netw., vol. 10, no. 8, pp. C50–C63, 2018.

Crossref Google Scholar

[21]

Y. Q.

Chen

, Y.

Zhou

, S.

Taneja

, X.

Qin

, and J. Z.

Huang

, aHDFS: An erasure-coded data archival system for Hadoop clusters, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 11, pp. 3060–3073, 2017.

Crossref Google Scholar

[22]

Z. Z.

Li

, H. Y.

Shen

, W.

Ligon

, and J.

Denton

, An exploration of designing a hybrid scale-up/out Hadoop architecture based on performance measurements, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 2, pp. 386–400, 2017.

[23]

H. F.

Wang

and Y. P.

Cao

, An energy efficiency optimization and control model for Hadoop clusters, IEEE Access, vol. 7, pp. 40534–40549, 2019.

Crossref Google Scholar

[24]

N. M. F.

Qureshi

, D. R.

Shin

, I. F.

Siddiqui

, and B. S.

Chowdhry

, Storage-tag-aware scheduler for Hadoop cluster, IEEE Access, vol. 5, pp. 13742–13755, 2017.

Crossref Google Scholar

[25]

Z. Z.

Li

and H. Y.

Shen

, Measuring scale-up and scale-out Hadoop with remote and local file systems and selecting the best platform, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 11, pp. 3201–3214, 2017.

Crossref Google Scholar

[26]

Y. P.

Zheng

and G. Y.

Chen

, Energy analysis and application of data mining algorithms for internet of things based on Hadoop cloud platform, IEEE Access, vol. 7, pp. 183195–183206, 2019.

Crossref Google Scholar

[27]

C. T.

Chen

, L. J.

Hung

, S. Y.

Hsieh

, R.

Buyya

, and A. Y.

Zomaya

, Heterogeneous job allocation scheduler for Hadoop mapreduce using dynamic grouping integrated neighboring search, IEEE Trans. Cloud Comput., vol. 8, no. 1, pp. 193–206, 2020.

Crossref Google Scholar

[28]

P. Q.

Jin

, X. J.

Hao

, X. L.

Wang

, and L. H.

Yue

, Energy-efficient task scheduling for CPU-intensive streaming jobs on Hadoop, IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 6, pp. 1298–1311, 2019.

Crossref Google Scholar

[29]

K.

Sridharan

, G.

Komarasamy

, and S.

Daniel Madan Raja

, Hadoop framework for efficient sentiment classification using trees, IET Netw., vol. 9, no. 5, pp. 223–228, 2020.

Crossref Google Scholar

[30]

Z. C.

Dou

, I.

Khalil

, A.

Khreishah

, and A.

Al-Fuqaha

, Robust insider attacks countermeasure for Hadoop: Design and implementation, IEEE Syst. J., vol. 12, no. 2, pp. 1874–1885, 2018.

Crossref Google Scholar

[31]

R.

Agarwal

, A. S.

Jalal

, and K. V.

Arya

, Local binary hexagonal extrema pattern (LBH XEP): A new feature descriptor for fake iris detection, Vis. Comput., vol. 37, no. 6, pp. 1357–1368, 2021.

Crossref Google Scholar

[32]

R.

Agarwal

, A. S.

Jalal

, and K. V.

Arya

, Enhanced binary hexagonal extrema pattern (EBH XEP) descriptor for iris liveness detection, Wirel. Pers. Commun., vol. 115, no. 3, pp. 2627–2643, 2020.

Crossref Google Scholar

[33]

R.

Agarwal

, A. S.

Jalal

, and K. V.

Arya

, A multimodal liveness detection using statistical texture features and spatial analysis, Multimed. Tools Appl., vol. 79, no. 19, pp. 13621–13645, 2020.

Crossref Google Scholar

[34]

R.

Agrawal

, A. S.

Jalal

, and K. V.

Arya

, Fake fingerprint liveness detection based on micro and macro features, Int. J. Biom., vol. 11, no. 2, pp. 177–206, 2019.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 6 Issue 4,
December 2023

Pages 465-477

DOI: 10.26599/BDMA.2022.9020026

Cite this article:

Kumar A, Varshney N, Bhatiya S, et al. Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data. Big Data Mining and Analytics, 2023, 6(4): 465-477. https://doi.org/10.26599/BDMA.2022.9020026

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号