Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori

Yang Yang; Yu-Ting Li; Yong-Hua Huo; Zhi-Peng Gao; Lan-Lan Rui

doi:10.1007/s11390-024-2408-1

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori

Yang Yang^¹, Yu-Ting Li^¹, Yong-Hua Huo^², Zhi-Peng Gao^¹(), Lan-Lan Rui^¹

1State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

2The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang 050081, China

Show Author Information

Abstract

The complexity of alarm detection and diagnosis tasks often results in a lack of alarm log data. Due to the strong rule associations inherent in alarm log data, existing data augmentation algorithms cannot obtain good results for alarm log data. To address this problem, this paper introduces a new algorithm for augmenting alarm log data, termed APRGAN, which combines a generative adversarial network (GAN) with the Apriori algorithm. APRGAN generates alarm log data under the guidance of rules mined by the rule miner. Moreover, we propose a new dynamic updating mechanism to alleviate the mode collapse problem of the GAN. In addition to updating the real reference dataset used to train the discriminator in the GAN, we dynamically update the parameters and the rule set of the Apriori algorithm according to the data generated in each epoch. Through extensive experimentation on two public datasets, it is demonstrated that APRGAN surpasses other data augmentation algorithms in the domain with respect to alarm log data augmentation, as evidenced by its superior performance on metrics such as BLEU, ROUGE, and METEOR.

Keywords

data augmentation alarm log data Apriori generative adversarial network (GAN)

Electronic Supplementary Material

Download File(s)

JCST-2204-12408-Highlights.pdf (452 KB)

References

[1]

Du M, Li F F, Zheng G N, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. the 2017 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2017, pp.1285–1298. DOI: 10.1145/3133956.3134015.

Crossref

[2]

Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In Proc. the 9th IEEE International Conference on Data Mining, Dec. 2009, pp.149–158. DOI: 10.1109/ICDM.2009.60.

Crossref

[3]

He S L, Zhu J M, He P J, Lyu M R. Experience report: System log analysis for anomaly detection. In Proc. the 27th IEEE International Symposium on Software Reliability Engineering, Oct. 2016, pp.207–218. DOI: 10.1109/ISSRE.2016.21.

Crossref

[4]

Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning. Journal of Big Data, 2019, 6(1): 60. DOI: 10.1186/s40537-019-0197-0.

Crossref Google Scholar

[5]

Lou J G, Fu Q, Yang S Q, Xu Y, Li J. Mining invariants from console logs for system problem detection. In Proc. the 2010 USENIX conference on USENIX Annual Technical Conference, Jun. 2010, Article No. 24.

[6]

Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In Proc. the 22nd ACM SIGOPS Symposium on Operating Systems Principles, Oct. 2009, pp.117–132. DOI: 10.1145/1629575.1629587.

Crossref

[7]

Zhang C K, Wang X Y, Zhang H Y, Zhang H Y, Han P Y. Log sequence anomaly detection based on local information extraction and globally sparse Transformer model. IEEE Trans. Network and Service Management, 2021, 18(4): 4119–4133. DOI: 10.1109/TNSM.2021.3125967.

Crossref Google Scholar

[8]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2672–2680.

[9]

Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499.

[10]

Du M, Li F F. Spell: Streaming parsing of system event logs. In Proc. the 16th IEEE International Conference on Data Mining, Dec. 2016, pp.859-864. DOI: 10.1109/ICDM.2016.0103.

Crossref

[11]

Liu P, Wang X M, Xiang C, Meng W Y. A survey of text data augmentation. In Proc. the 2020 International Conference on Computer Communication and Network Security, Aug. 2020, pp.191–195. DOI: 10.1109/CCNS50731.2020.00049.

Crossref

[12]

Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357.

Crossref Google Scholar

[13]

Alejo R, García V, Pacheco-Sánchez J H. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Processing Letters, 2015, 42(3): 603–617. DOI: 10.1007/s11063-014-9376-3.

Crossref Google Scholar

[14]

Rivera W A. Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Information Sciences, 2017, 408: 146–161. DOI: 10.1016/j.ins.2017.04.046.

Crossref Google Scholar

[15]

Yu L T, Zhang W N, Wang J, Yu Y. seqGAN: Sequence generative adversarial nets with policy gradient. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.2852–2858. DOI: 10.1609/aaai.v31i1.10804.

Crossref

[16]

Lin K, Li D Q, He X D, Zhang Z Y, Sun M T. Adversarial ranking for language generation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.3158–3168.

[17]

Guo J X, Lu S D, Cai H, Zhang W N, Yu Y, Wang J. Long text generation via adversarial training with leaked information. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.5141–5148. DOI: 10.1609/aaai.v32i1.11957.

Crossref

[18]

Makanju A, Zincir-Heywood A N, Milios E E. Investigating event log analysis with minimum apriori information. In Proc. the 2013 IFIP/IEEE International Symposium on Integrated Network Management, May 2013, pp.962–968.

[19]

Hu W K, Chen T W, Shah S L. Discovering association rules of mode-dependent alarms from alarm and event logs. IEEE Trans. Control Systems Technology, 2018, 26(3): 971–983. DOI: 10.1109/TCST.2017.2695169.

Crossref Google Scholar

[20]

Wang C, Vo H T, Ni P. An IoT application for fault diagnosis and prediction. In Proc. the 2015 IEEE International Conference on Data Science and Data Intensive Systems, Dec. 2015, pp.726–731. DOI: 10.1109/DSDIS.2015.97.

Crossref

[21]

Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S. Recurrent neural network based language model. In Proc. the 11th Annual Conference of the International Speech Communication Association, Sept. 2010, pp.1045–1048.

Crossref

[22]

Sutton R S, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In Proc. the 12th International Conference on Neural Information Processing Systems, Nov. 1999, pp.1057–1063.

[23]

Borthakur D. HDFS architecture guide. May 2022. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.pdf, Jul. 2024.

[24]

Rosado T, Bernardino J. An overview of openstack architecture. In Proc. the 18th International Database Engineering & Applications Symposium, Jul. 2014, pp.366–367. DOI: 10.1145/2628194.2628195.

Crossref

[25]

Papineni K, Roukos S, Ward T, Zhu W J. Bleu: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, Jul. 2002, pp.311-318. DOI: 10.3115/1073083.1073135.

Crossref

[26]

Lin C. ROUGE: A package for automatic evaluation of summaries. In Proc. the 2004 Text Summarization Branches Out, Jul. 2004, pp.74–81.

[27]

Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proc. the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Jun. 2005, pp.65–72.

Journal of Computer Science and Technology

Volume 39 Issue 4,
July 2024

Pages 951-966

DOI: 10.1007/s11390-024-2408-1

Cite this article:

Yang Y, Li Y-T, Huo Y-H, et al. Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori. Journal of Computer Science and Technology, 2024, 39(4): 951-966. https://doi.org/10.1007/s11390-024-2408-1