AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori

State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang 050081, China
Show Author Information

Abstract

The complexity of alarm detection and diagnosis tasks often results in a lack of alarm log data. Due to the strong rule associations inherent in alarm log data, existing data augmentation algorithms cannot obtain good results for alarm log data. To address this problem, this paper introduces a new algorithm for augmenting alarm log data, termed APRGAN, which combines a generative adversarial network (GAN) with the Apriori algorithm. APRGAN generates alarm log data under the guidance of rules mined by the rule miner. Moreover, we propose a new dynamic updating mechanism to alleviate the mode collapse problem of the GAN. In addition to updating the real reference dataset used to train the discriminator in the GAN, we dynamically update the parameters and the rule set of the Apriori algorithm according to the data generated in each epoch. Through extensive experimentation on two public datasets, it is demonstrated that APRGAN surpasses other data augmentation algorithms in the domain with respect to alarm log data augmentation, as evidenced by its superior performance on metrics such as BLEU, ROUGE, and METEOR.

Electronic Supplementary Material

Download File(s)
JCST-2204-12408-Highlights.pdf (452 KB)

References

[1]
Du M, Li F F, Zheng G N, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. the 2017 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2017, pp.1285–1298. DOI: 10.1145/3133956.3134015.
[2]
Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In Proc. the 9th IEEE International Conference on Data Mining, Dec. 2009, pp.149–158. DOI: 10.1109/ICDM.2009.60.
[3]
He S L, Zhu J M, He P J, Lyu M R. Experience report: System log analysis for anomaly detection. In Proc. the 27th IEEE International Symposium on Software Reliability Engineering, Oct. 2016, pp.207–218. DOI: 10.1109/ISSRE.2016.21.
[4]

Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning. Journal of Big Data, 2019, 6(1): 60. DOI: 10.1186/s40537-019-0197-0.

[5]
Lou J G, Fu Q, Yang S Q, Xu Y, Li J. Mining invariants from console logs for system problem detection. In Proc. the 2010 USENIX conference on USENIX Annual Technical Conference, Jun. 2010, Article No. 24.
[6]
Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In Proc. the 22nd ACM SIGOPS Symposium on Operating Systems Principles, Oct. 2009, pp.117–132. DOI: 10.1145/1629575.1629587.
[7]

Zhang C K, Wang X Y, Zhang H Y, Zhang H Y, Han P Y. Log sequence anomaly detection based on local information extraction and globally sparse Transformer model. IEEE Trans. Network and Service Management, 2021, 18(4): 4119–4133. DOI: 10.1109/TNSM.2021.3125967.

[8]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2672–2680.
[9]
Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499.
[10]
Du M, Li F F. Spell: Streaming parsing of system event logs. In Proc. the 16th IEEE International Conference on Data Mining, Dec. 2016, pp.859-864. DOI: 10.1109/ICDM.2016.0103.
[11]
Liu P, Wang X M, Xiang C, Meng W Y. A survey of text data augmentation. In Proc. the 2020 International Conference on Computer Communication and Network Security, Aug. 2020, pp.191–195. DOI: 10.1109/CCNS50731.2020.00049.
[12]

Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357.

[13]

Alejo R, García V, Pacheco-Sánchez J H. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Processing Letters, 2015, 42(3): 603–617. DOI: 10.1007/s11063-014-9376-3.

[14]

Rivera W A. Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Information Sciences, 2017, 408: 146–161. DOI: 10.1016/j.ins.2017.04.046.

[15]
Yu L T, Zhang W N, Wang J, Yu Y. seqGAN: Sequence generative adversarial nets with policy gradient. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.2852–2858. DOI: 10.1609/aaai.v31i1.10804.
[16]
Lin K, Li D Q, He X D, Zhang Z Y, Sun M T. Adversarial ranking for language generation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.3158–3168.
[17]
Guo J X, Lu S D, Cai H, Zhang W N, Yu Y, Wang J. Long text generation via adversarial training with leaked information. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.5141–5148. DOI: 10.1609/aaai.v32i1.11957.
[18]
Makanju A, Zincir-Heywood A N, Milios E E. Investigating event log analysis with minimum apriori information. In Proc. the 2013 IFIP/IEEE International Symposium on Integrated Network Management, May 2013, pp.962–968.
[19]

Hu W K, Chen T W, Shah S L. Discovering association rules of mode-dependent alarms from alarm and event logs. IEEE Trans. Control Systems Technology, 2018, 26(3): 971–983. DOI: 10.1109/TCST.2017.2695169.

[20]
Wang C, Vo H T, Ni P. An IoT application for fault diagnosis and prediction. In Proc. the 2015 IEEE International Conference on Data Science and Data Intensive Systems, Dec. 2015, pp.726–731. DOI: 10.1109/DSDIS.2015.97.
[21]
Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S. Recurrent neural network based language model. In Proc. the 11th Annual Conference of the International Speech Communication Association, Sept. 2010, pp.1045–1048.
[22]
Sutton R S, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In Proc. the 12th International Conference on Neural Information Processing Systems, Nov. 1999, pp.1057–1063.
[23]
Borthakur D. HDFS architecture guide. May 2022. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.pdf, Jul. 2024.
[24]
Rosado T, Bernardino J. An overview of openstack architecture. In Proc. the 18th International Database Engineering & Applications Symposium, Jul. 2014, pp.366–367. DOI: 10.1145/2628194.2628195.
[25]
Papineni K, Roukos S, Ward T, Zhu W J. Bleu: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, Jul. 2002, pp.311-318. DOI: 10.3115/1073083.1073135.
[26]
Lin C. ROUGE: A package for automatic evaluation of summaries. In Proc. the 2004 Text Summarization Branches Out, Jul. 2004, pp.74–81.
[27]
Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proc. the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Jun. 2005, pp.65–72.
Journal of Computer Science and Technology
Pages 951-966
Cite this article:
Yang Y, Li Y-T, Huo Y-H, et al. Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori. Journal of Computer Science and Technology, 2024, 39(4): 951-966. https://doi.org/10.1007/s11390-024-2408-1

96

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 13 April 2022
Accepted: 12 June 2024
Published: 20 September 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return