| Sign up

PDF (1.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT Networks

Selvam Saravanan(), Uma Maheswari Balasubramanian

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India

Show Author Information

Abstract

The current large-scale Internet of Things (IoT) networks typically generate high-velocity network traffic streams. Attackers use IoT devices to create botnets and launch attacks, such as DDoS, Spamming, Cryptocurrency mining, Phishing, etc. The service providers of large-scale IoT networks need to set up a data pipeline to collect the vast network traffic data from the IoT devices, store it, analyze it, and report the malicious IoT devices and types of attacks. Further, the attacks originating from IoT devices are dynamic, as attackers launch one kind of attack at one time and another kind of attack at another time. The number of attacks and benign instances also vary from time to time. This phenomenon of change in attack patterns is called concept drift. Hence, the attack detection system must learn continuously from the ever-changing real-time attack patterns in large-scale IoT network traffic. To meet this requirement, in this work, we propose a data pipeline with Apache Kafka, Apache Spark structured streaming, and MongoDB that can adapt to the ever-changing attack patterns in real time and classify attacks in large-scale IoT networks. When concept drift is detected, the proposed system retrains the classifier with the instances that cause the drift and a representative subsample instances from the previous training of the model. The proposed approach is evaluated with the latest dataset, IoT23, which consists of benign and several attack instances from various IoT devices. Attack classification accuracy is improved from 97.8% to 99.46% by the proposed system. The training time of distributed random forest algorithm is also studied by varying the number of cores in Apache Spark environment.

Keywords

Internet of Things (IoT)Apache Spark Apache Kafka MongoDB streaming concept drift

References

[1]

T. Alsboui, Y. R. Qin, R. Hill, and H. Al-Aqrabi, Enabling distributed intelligence for the internet of things with iota and mobile agents, Computing, vol. 102, no. 6, pp. 1345–1363, 2020.

Crossref Google Scholar

[2]

T. F. Tu, J. W. Qin, H. Zhang, M. Chen, T. Xu, and Y. Huang, A comprehensive study of Mozi botnet, Int. J. Intell. Syst., vol. 37, no. 10, pp. 6877–6908, 2022.

Crossref Google Scholar

[3]

R. Vinayakumar, M. Alazab, S. Srinivasan, Q. V. Pham, S. K. Padannayil, and K. Simran, A visualized botnet detection system based deep learning for the internet of things networks of smart cities, IEEE Trans. Ind. Appl., vol. 56, no. 4, pp. 4436–4456, 2020.

Crossref Google Scholar

[4]

Z. Wang, W. Dai, F. Wang, H. Deng, S. Wei, X. Zhang, and B. Liang, Kafka and its using in high-throughput and reliable message distribution, in Proc. 8^th Int. Conf. Intelligent Networks and Intelligent Systems, Tianjin, China, 2015, pp. 117–120.

[5]

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, et al., Apache Spark: A unified engine for big data processing, Commun. ACM, vol. 59, no. 11, pp. 56–65, 2016.

Crossref Google Scholar

[6]

M. Armbrust, T. Das, J. Torres, B. Yavuz, S. Zhu, R. Xin, A. Ghodsi, I. Stoica, and M. Zaharia, Structured streaming: A declarative API for real-time applications in apache spark, in Proc. 2018 Int. Conf. Management of Data, Houston, TX, USA, 2018, pp. 601–613.

[7]

Y. S. Kang, I. H. Park, J. Rhee, and Y. H. Lee, MongoDB-based repository design for IoT-generated RFID/sensor big data, IEEE Sens. J., vol. 16, no. 2, pp. 485–497, 2016.

Crossref Google Scholar

[8]

D. Dasgupta, Z. Akhtar, and S. Sen, Machine learning in cybersecurity: A comprehensive survey, J. Def. Model. Simul.: Appl. Methodol. Technol., vol. 19, no. 1, pp. 57–106, 2022.

Crossref Google Scholar

[9]

I. Žliobaitė, M. Pechenizkiy, and J. Gama, An overview of concept drift applications, in Big Data Analysis : New Algorithms for a New Society, N. Japkowicz and J. Stefanowski, eds. Cham, Switcherland: 2016, pp. 91–114.

[10]

R. S. M. Barros and S. G. T. C. Santos, A large-scale comparison of concept drift detectors, Inf. Sci., vols. 451&452, pp. 348–370, 2018.

Crossref Google Scholar

[11]

I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghédira, Discussion and review on evolving data streams and concept drift adapting, Evol. Syst., vol. 9, no. 1, pp. 1–23, 2018.

Crossref Google Scholar

[12]

G. Sebastian, A. Parmisano, and M. J. Erquiaga, IoT-23: A labeled dataset with malicious and benign IoT network traffic (Version 1.0.0). Zenodo, http://doi.org/10.5281/zenodo.4743746, 2000.

[13]

M. Jain and G. Kaur, Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data, Cluster Comput., vol. 24, no. 3, pp. 2099–2114, 2021.

Crossref Google Scholar

[14]

L. Yang and A. Shami, A lightweight concept drift detection and adaptation framework for IoT data streams, IEEE Internet Things Mag., vol. 4, no. 2, pp. 96–101, 2021.

Crossref Google Scholar

[15]

B. H. Schwengber, A. Vergütz, N. G. Prates, and M. Nogueira, Learning from network data changes for unsupervised botnet detection, IEEE Trans. Netw. Serv. Manage., vol. 19, no. 1, pp. 601–613, 2022.

Crossref Google Scholar

[16]

H. L. Qiao, B. Novikov, and J. O. Blech, Concept drift analysis by dynamic residual projection for effectively detecting botnet cyber-attacks in IOT scenarios, IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 3692–3701, 2022.

Crossref Google Scholar

[17]

L. Yang, D. M. Manias, and A. Shami, PWPAE: An ensemble framework for concept drift adaptation in IoT data streams, in Proc. 2021 IEEE Global Communications Conf., Madrid, Spain, 2021, pp. 1–6.

[18]

L. Yang and A. Shami, A multi-stage automated online network data stream analytics framework for IIoT Systems, IEEE Trans. Ind. Inf., vol. 19, no. 2, pp. 2107–2116, 2023.

Crossref Google Scholar

[19]

Ł. Korycki and B. Krawczyk, Concept drift detection from multi-class imbalanced data streams, in Proc. IEEE 37^th Int. Conf. Data Engineering, Chania, Greece, 2021, pp. 1068–1079.

[20]

H. Mehmood, P. Kostakos, M. Cortes, T. Anagnostopoulos, S. Pirttikangas, and E. Gilman, Concept drift adaptation techniques in distributed environment for real-world data streams, Smart Cities, vol. 4, no. 1, pp. 349–371, 2021.

Crossref Google Scholar

[21]

J. Á. Cid-Fuentes, C. Szabo, and K. Falkner, An adaptive framework for the detection of novel botnets, Comput. Secur., vol. 79, pp. 148–161, 2018.

Crossref Google Scholar

[22]

M. N. Gelian, H. Mashayekhi, and Y. Mashayekhi, A self-learning stream classifier for flow-based botnet detection, Int. J. Commun. Syst., vol. 32, no. 16, p. e4143, 2019.

Crossref Google Scholar

[23]

B. H. Schwengber, A. Vergütz, N. G. Prates, and M. Nogueira, A method aware of concept drift for online botnet detection, in Proc. 2020 IEEE Global Communications Conf., Taipei, China, 2020, pp. 1–6.

[24]

Z. Wang, M. Tian, and C. Jia, An active and dynamic botnet detection approach to track hidden concept drift, in Proc. 19^th Int. Conf. Information and Communications Security, Beijing, China, 2018, pp. 646–660.

[25]

Z. Shao, S. Yuan, and Y. Wang, Adaptive online learning for IOT botnet detection, Inf. Sci., vol. 574, pp. 84–95, 2021.

Crossref Google Scholar

[26]

X. Yuan, R. Wang, Y. Zhuang, K. Zhu, and J. Hao, A concept drift based ensemble incremental learning approach for intrusion detection, in Proc. IEEE Int. Conf. Internet of Things (iThings ) and IEEE Green Computing and Communications (GreenCom ) and IEEE Cyber, Physical and Social Computing (CPSCom ) and IEEE Smart Data (SmartData ), Halifax, Canada, 2018, pp. 350–357.

[27]

L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, CADE: Detecting and explaining concept drift samples for security applications, in Proc. 30th USENIX Security Symp., Virtual Event, 2021, pp. 2327–2344.

[28]

G. Andresini, F. Pendlebury, F. Pierazzi, C. Loglisci, A. Appice, and L. Cavallaro, INSOMNIA: Towards concept-drift robustness in network intrusion detection, in Proc. 14^th ACM Workshop on Artificial Intelligence and Security, Virtual Event, 2021, pp. 111–122.

[29]

P. V. N. Rajeswari, M. Shashi, T. K. Rao, M. Rajya Lakshmi, and L. V. Kiran, Effective intrusion detection system using concept drifting data stream and support vector machine, Concurr. Comp.: Pract. Exper., vol. 34, no. 21, p. e7118, 2022.

Crossref Google Scholar

[30]

W. Liu, C. Zhu, Z. Ding, H. Zhang, and Q. Liu, Multiclass imbalanced and concept drift network traffic classification framework based on online active learning, Eng. Appl. Artif. Intell., vol. 117, p. 105607, 2023.

Crossref Google Scholar

[31]

O. A. Wahab, Intrusion detection in the IoT under data and concept drifts: Online deep learning approach, IEEE Internet Things J., vol. 9, no. 20, pp. 19706–19716, 2022.

Crossref Google Scholar

[32]

Z. Aouini and A. Pekar, NFStream: A flexible network data analysis framework, Comput. Netw., vol. 204, p. 108719, 2022.

[33]

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, et al., MLlib: Machine learning in apache spark, J. Mach. Learn. Res., vol. 17, no. 1, pp. 1235–1241, 2016.

[34]

M. J. Gul and M. K. U. R. R. Syed, Network attack detection in IoT using artificial intelligence, in Proc. 2023 Int. Multi-disciplinary Conf. in Emerging Research Trends, Karachi, Pakistan, 2023, pp. 1–6.

Big Data Mining and Analytics

Volume 7 Issue 2,
June 2024

Pages 500-511

DOI: 10.26599/BDMA.2023.9020027

Cite this article:

Saravanan S, Maheswari Balasubramanian U. An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT Networks. Big Data Mining and Analytics, 2024, 7(2): 500-511. https://doi.org/10.26599/BDMA.2023.9020027

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号