Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The current large-scale Internet of Things (IoT) networks typically generate high-velocity network traffic streams. Attackers use IoT devices to create botnets and launch attacks, such as DDoS, Spamming, Cryptocurrency mining, Phishing, etc. The service providers of large-scale IoT networks need to set up a data pipeline to collect the vast network traffic data from the IoT devices, store it, analyze it, and report the malicious IoT devices and types of attacks. Further, the attacks originating from IoT devices are dynamic, as attackers launch one kind of attack at one time and another kind of attack at another time. The number of attacks and benign instances also vary from time to time. This phenomenon of change in attack patterns is called concept drift. Hence, the attack detection system must learn continuously from the ever-changing real-time attack patterns in large-scale IoT network traffic. To meet this requirement, in this work, we propose a data pipeline with Apache Kafka, Apache Spark structured streaming, and MongoDB that can adapt to the ever-changing attack patterns in real time and classify attacks in large-scale IoT networks. When concept drift is detected, the proposed system retrains the classifier with the instances that cause the drift and a representative subsample instances from the previous training of the model. The proposed approach is evaluated with the latest dataset, IoT23, which consists of benign and several attack instances from various IoT devices. Attack classification accuracy is improved from 97.8% to 99.46% by the proposed system. The training time of distributed random forest algorithm is also studied by varying the number of cores in Apache Spark environment.
T. Alsboui, Y. R. Qin, R. Hill, and H. Al-Aqrabi, Enabling distributed intelligence for the internet of things with iota and mobile agents, Computing, vol. 102, no. 6, pp. 1345–1363, 2020.
T. F. Tu, J. W. Qin, H. Zhang, M. Chen, T. Xu, and Y. Huang, A comprehensive study of Mozi botnet, Int. J. Intell. Syst., vol. 37, no. 10, pp. 6877–6908, 2022.
R. Vinayakumar, M. Alazab, S. Srinivasan, Q. V. Pham, S. K. Padannayil, and K. Simran, A visualized botnet detection system based deep learning for the internet of things networks of smart cities, IEEE Trans. Ind. Appl., vol. 56, no. 4, pp. 4436–4456, 2020.
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, et al., Apache Spark: A unified engine for big data processing, Commun. ACM, vol. 59, no. 11, pp. 56–65, 2016.
Y. S. Kang, I. H. Park, J. Rhee, and Y. H. Lee, MongoDB-based repository design for IoT-generated RFID/sensor big data, IEEE Sens. J., vol. 16, no. 2, pp. 485–497, 2016.
D. Dasgupta, Z. Akhtar, and S. Sen, Machine learning in cybersecurity: A comprehensive survey, J. Def. Model. Simul.: Appl. Methodol. Technol., vol. 19, no. 1, pp. 57–106, 2022.
R. S. M. Barros and S. G. T. C. Santos, A large-scale comparison of concept drift detectors, Inf. Sci., vols. 451&452, pp. 348–370, 2018.
I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghédira, Discussion and review on evolving data streams and concept drift adapting, Evol. Syst., vol. 9, no. 1, pp. 1–23, 2018.
M. Jain and G. Kaur, Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data, Cluster Comput., vol. 24, no. 3, pp. 2099–2114, 2021.
L. Yang and A. Shami, A lightweight concept drift detection and adaptation framework for IoT data streams, IEEE Internet Things Mag., vol. 4, no. 2, pp. 96–101, 2021.
B. H. Schwengber, A. Vergütz, N. G. Prates, and M. Nogueira, Learning from network data changes for unsupervised botnet detection, IEEE Trans. Netw. Serv. Manage., vol. 19, no. 1, pp. 601–613, 2022.
H. L. Qiao, B. Novikov, and J. O. Blech, Concept drift analysis by dynamic residual projection for effectively detecting botnet cyber-attacks in IOT scenarios, IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 3692–3701, 2022.
L. Yang and A. Shami, A multi-stage automated online network data stream analytics framework for IIoT Systems, IEEE Trans. Ind. Inf., vol. 19, no. 2, pp. 2107–2116, 2023.
H. Mehmood, P. Kostakos, M. Cortes, T. Anagnostopoulos, S. Pirttikangas, and E. Gilman, Concept drift adaptation techniques in distributed environment for real-world data streams, Smart Cities, vol. 4, no. 1, pp. 349–371, 2021.
J. Á. Cid-Fuentes, C. Szabo, and K. Falkner, An adaptive framework for the detection of novel botnets, Comput. Secur., vol. 79, pp. 148–161, 2018.
M. N. Gelian, H. Mashayekhi, and Y. Mashayekhi, A self-learning stream classifier for flow-based botnet detection, Int. J. Commun. Syst., vol. 32, no. 16, p. e4143, 2019.
Z. Shao, S. Yuan, and Y. Wang, Adaptive online learning for IOT botnet detection, Inf. Sci., vol. 574, pp. 84–95, 2021.
P. V. N. Rajeswari, M. Shashi, T. K. Rao, M. Rajya Lakshmi, and L. V. Kiran, Effective intrusion detection system using concept drifting data stream and support vector machine, Concurr. Comp.: Pract. Exper., vol. 34, no. 21, p. e7118, 2022.
W. Liu, C. Zhu, Z. Ding, H. Zhang, and Q. Liu, Multiclass imbalanced and concept drift network traffic classification framework based on online active learning, Eng. Appl. Artif. Intell., vol. 117, p. 105607, 2023.
O. A. Wahab, Intrusion detection in the IoT under data and concept drifts: Online deep learning approach, IEEE Internet Things J., vol. 9, no. 20, pp. 19706–19716, 2022.
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, et al., MLlib: Machine learning in apache spark, J. Mach. Learn. Res., vol. 17, no. 1, pp. 1235–1241, 2016.
273
Views
39
Downloads
0
Crossref
0
Web of Science
2
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).