AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (10 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning

College of Cybersecurity, Sichuan University, Chengdu 610065, China.
Cybersecurity Research Institute, Sichuan University, Chengdu 610065, China.
Show Author Information
An erratum to this article is available online at:

Abstract

Botnets based on the Domain Generation Algorithm (DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family and the imbalance of samples continue to impede research on DGA detection. In the existing work, the sample size of each DGA family is regarded as the most important determinant of the resampling proportion; thus, differences in the characteristics of various samples are ignored, and the optimal resampling effect is not achieved. In this paper, a Long Short-Term Memory-based Property and Quantity Dependent Optimization (LSTM.PQDO) method is proposed. This method takes advantage of LSTM to automatically mine the comprehensive features of DGA domain names. It iterates the resampling proportion with the optimal solution based on a comprehensive consideration of the original number and characteristics of the samples to heuristically search for a better solution around the initial solution in the right direction; thus, dynamic optimization of the resampling proportion is realized. The experimental results show that the LSTM.PQDO method can achieve better performance compared with existing models to overcome the difficulties of unbalanced datasets; moreover, it can function as a reference for sample resampling tasks in similar scenarios.

References

[1]
N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, Botnet in DDoS attacks: Trends and challenges, IEEE Commun. Surv. Tutor., vol. 17, no. 4, pp. 2242-2270, 2015.
[2]
C. L. Zhou, K. Chen, X. X. Gong, P. Chen, and H. Ma, Detection of fast-flux domains based on passive DNS analysis, (in Chinese), Acta Sci. Natur. Univ. Pekinensis, vol. 52, no. 3, pp. 396-402, 2016.
[3]
C. D. Chang and H. T. Lin, On similarities of string and query sequence for DGA botnet detection, in Proc. 2018 Int. Conf. on Information Networking, Chiang Mai, Thailand, 2018, pp. 104-109.
[4]
J. Kwon, J. Lee, H. Lee, and A. Perrig, PsyBoG: A scalable botnet detection method for large-scale DNS traffic, Comput Networks, vol. 97, pp. 48-73, 2016.
[5]
S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, Detecting algorithmically generated domain-flux attacks with DNS traffic analysis, IEEE/ACM Trans. Netw., vol. 20, no. 5, pp. 1663-1677, 2012.
[6]
S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix: DGA-based botnet tracking and intelligence, presented at 11th Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment, Egham, UK, 2014, pp. 192-211.
[7]
D. T. Truong and G. Cheng, Detecting domain-flux botnet based on DNS traffic features in managed network, Secur. Commun. Networks, vol. 9, no. 14, 2016, pp. 2338-2347.
[8]
V. Tong and G. Nguyen, A method for detecting DGA botnet based on semantic and cluster analysis, in Proc. Seventh Symp. on Information and Communication Technology, Ho Chi Minh City, Vietnam, 2016, pp. 272-277.
[9]
J. Mathew, M. Luo, C. K. Pang, and H. L. Chan, Kernel-based SMOTE for SVM classification of imbalanced datasets, in Proc. 41st Conf. of the IEEE Industrial Electronics Society, Yokohama, Japan, 2015, pp. 1127-1132.
[10]
W. C. Lin, C. F. Tsai, Y. H. Hu, and J. S. Jhang, Clustering-based undersampling in class-imbalanced data, Inf Sci, vol. 409-410, pp. 17-26, 2017.
[11]
J. Ha and J. S. Lee, A new under-sampling method using genetic algorithm for imbalanced data classification, presented at 10th Int. Conf. on Ubiquitous Information Management and Communication, Danang, Vietnam, 2016.
[12]
S. Gazzah, A. Hechkel, and N. E. B. Amara, A hybrid sampling method for imbalanced data, in Proc. 2015 IEEE 12th Int. Multi-Conference on Systems, Signals & Devices, Mahdia, Tunisia, 2015, pp. 1-6.
[13]
D. Tran, H. Mac, V. Tong, H. A. Tran, and L. G. Nguyen, A LSTM based framework for handling multiclass imbalance in DGA botnet detection, Neurocomputing, vol. 275, pp. 2401-2413, 2018.
[14]
Y. C. Chen, Y. J. Li, A. Tseng, and T. Lin, Deep learning for malicious flow detection, arXiv preprint arXiv: 1802.03358, 2018.
[15]
J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, Predicting domain generation algorithms with long short-term memory networks, arXiv preprint arXiv: 1611.00791, 2016.
[16]
Y. Li, K. Q. Xiong, T. Chin, and C. Hu, A machine learning framework for domain generation algorithm-based malware detection, IEEE Access, vol. 7, pp. 32 765-32 782, 2019.
[17]
F. Zeng, S. Chang, and X. C. Wan, Classification for DGA-based malicious domain names with deep learning architectures, Int. J. Intell. Inf. Syst., vol. 6, no. 6, pp. 67-71, 2017.
[18]
B. Athiwaratkun and J. W. Stokes, Malware classification with LSTM and GRU language models and a character-level CNN, in Proc. 2017 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 2017, pp. 2482-2486.
[19]
B. Yu, J. Pan, J. M. Hu, A. Nascimento, and M. De Cock, Character level based detection of DGA domain names, in Proc. 2018 Int. Joint Conf. on Neural Networks, Rio de Janeiro, Brazil, 2018, pp. 1-8.
[20]
L. L. Gao, Z. Guo, H. W. Zhang, X. Xu, and H. T. Shen, Video captioning with attention-based LSTM and semantic consistency, IEEE Trans. Multimed., vol. 19, no. 9, pp. 2045-2055, 2017.
[21]
Bambenek Consulting-Master feeds, http://osint.bambenekconsulting.com/feeds/, 2019.
Tsinghua Science and Technology
Pages 387-402
Cite this article:
Chen Y, Pang B, Shao G, et al. DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning. Tsinghua Science and Technology, 2021, 26(4): 387-402. https://doi.org/10.26599/TST.2020.9010021

1029

Views

79

Downloads

14

Crossref

8

Web of Science

17

Scopus

1

CSCD

Altmetrics

Received: 04 October 2019
Accepted: 05 November 2019
Published: 04 January 2021
© The author(s) 2021

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return