AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Topic Modeling Based Warning Prioritization from Change Sets of Software Repository

College of Informatics, Korea University, Seoul 02841, Korea
College of Knowledge-Based Services Engineering, Sungshin University, Seoul 02844, Korea
Show Author Information

Abstract

Many existing warning prioritization techniques seek to reorder the static analysis warnings such that true positives are provided first. However, excessive amount of time is required therein to investigate and fix prioritized warnings because some are not actually true positives or are irrelevant to the code context and topic. In this paper, we propose a warning prioritization technique that reflects various latent topics from bug-related code blocks. Our main aim is to build a prioritization model that comprises separate warning priorities depending on the topic of the change sets to identify the number of true positive warnings. For the performance evaluation of the proposed model, we employ a performance metric called warning detection rate, widely used in many warning prioritization studies, and compare the proposed model with other competitive techniques. Additionally, the effectiveness of our model is verified via the application of our technique to eight industrial projects of a real global company.

Electronic Supplementary Material

Download File(s)
jcst-35-6-1461-Highlights.pdf (239.6 KB)

References

[1]

Heckman S, Williams L. A systematic literature review of actionable alert identification techniques for automated static code analysis. Information and Software Technology, 2011, 53(4): 363-387.

[2]

Csallner C, Smaragdakis Y, Xie T. DSD-Crasher: A hybrid analysis tool for bug finding. ACM Transactions on Software Engineering and Methodology, 2008, 17(2): Article No. 8.

[3]
Heckman S, Williams L. On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques. In Proc. the 2nd ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, October 2008, pp.41-50.
[4]
Kim S, Ernst M D. Which warnings should I fix first? In Proc. the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, September 2007, pp.45-54.
[5]
Hanam Q, Tan L, Holmes R, Lam P. Finding patterns in static analysis alerts: Improving actionable alert ranking. In Proc. the 11th ACM Working Conference on Mining Software Repositories, May 2014, pp.152-161
[6]
Kim S, Ernst M D. Prioritizing warning categories by analyzing software history. In Proc. the 4th International Workshop on Mining Software Repositories, May 2007, Article No. 27.
[7]
Corley C S, Damevski K, Kraft N A. Changeset-based topic modeling of software repositories. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2018.2874960.
[8]
Corley C S, Kashuda K L, Kraft N A. Modeling changeset topics for feature location. In Proc. the 31st IEEE International Conference on Software Maintenance and Evolution, September 2015, pp.71-80.
[9]
Rama G M, Sarkar S, Heafield K. Mining business topics in source code using latent Dirichlet allocation. In Proc. the 1st Annual India Software Engineering Conference, February 2008, pp.113-120.
[10]
Savage T, Dit B, Gethers M, Poshyvanyk D. TopicXP: Exploring topics in source code using latent Dirichlet allocation. In Proc. the 26th IEEE International Conference on Software Maintenance, September 2010.
[11]

Lukins S K, Kraft N A, Etzkorn L H. Bug localization using latent Dirichlet allocation. Information and Software Technology, 2010, 52(9): 972-990.

[12]
Nguyen A T, Nguyen T T, Al-Kofahi J, Nguyen H V, Nguyen T N. A topic-based approach for narrowing the search space of buggy files from a bug report. In Proc. the 26th IEEE/ACM International. Conference on Automated Software Engineering, November 2011, pp.263-272.
[13]

Biggers L R, Bocovich C, Capshaw R, Eddy B P, Etzkorn L H, Kraft N A. Configuring latent Dirichlet allocation based feature location. Empirical Software Engineering, 2014, 19(3): 465-500.

[14]
Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P. Mining concepts from code with probabilistic topic models. In Proc. the 22nd IEEE/ACM International Conference on Automated Software Engineering, November 2007, pp.461-464.
[15]

Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.

[16]
Mockus A, Votta L G. Identifying reasons for software changes using historic databases. In Proc. the 16th International Conference on Software Maintenance, October 2000, pp.120-130.
[17]

Witten I, Frank E, Hall M, Pal C. Data Mining: Practical Machine Learning Tools and Techniques (4th edition). Morgan Kaufmann, 2016.

[18]
Ponweiser M. Latent Dirichlet allocation in R [M.S. Thesis]. Vienna University of Economics and Business, 2012.
[19]
Chang J, Gerrish S, Wang C, Boyd-Graber J L, Blei D M. Reading tea leaves: How humans interpret topic models. In Proc. the 23rd Annual Conference on Neural Information Processing Systems, December 2009, pp.288-296.
[20]
Wang J, Wang S, Wang Q. Is there a “golden” feature set for static warning identification? An experimental evaluation. In Proc. the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, October 2018, Article No. 17.
[21]

Uddin J, Ghazali J, DerisM M, Naseem R, Shah S. A survey on bug prioritization. Artificial Intelligence Review, 2017, 47(2): 145-180.

[22]
Rahman F, Posnett D, Hindle A, Barr E, Devanbu P. Bug-Cache for inspections: Hit or miss? In Proc. the 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering and 13th European Software Engineering Conference, September 2011, pp.322-331.
[23]
Hata H, Mizuno O, Kikuno T. Bug prediction based on fine-grained module histories. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.200-210.
[24]

Koru A G, Emam K E, Zhang D, Liu H, Mathew D. Theory of relative defect proneness. Empirical Software Engineering, 2008, 13(5): 473-498.

[25]

Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A. Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering, 2010, 17(4): 375-407.

[26]

Arisholm E, Briand L C, Johannessen E B. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 2010, 83(1): 2-17.

[27]
Mende T, Koschke R. Effort-aware defect prediction models. In Proc. the 14th European Conference on Software Maintenance and Reengineering, March 2010, pp.107-116.
[28]
AlSumait L, Barbará D, Domeniconi C. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.3-12.
[29]
Canini K, Shi L, Griffiths T. Online inference of topics with latent Dirichlet allocation. In Proc. the 12th International Conference on Artificial Intelligence and Statistics, April 2009, pp.65-72.
[30]
Hoffman M, Bach F R, Blei D M. Online learning for latent Dirichlet allocation. In Proc. the 24th Annual Conference on Neural Information Processing Systems, December 2010, pp.856-864.
[31]

Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the Association for Information Science and Technology, 1990, 41(6): 391-407.

[32]
Hofmann T. Probabilistic latent semantic indexing. In Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 1999, pp.50-57.
[33]
Steyvers M, Griffiths T. Probabilistic topic models. In Handbook of Latent Semantic Analysis, Landauer T, Mc-Namara D, Dennis S, Kintsch W (eds.), Psychology Press, 2007, pp.424-440.
[34]
Thomas SW. Mining software repositories using topic models. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.1138-1139.
[35]

Sun X, Li B, Leung H, Li B, Li Y. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks. Information and Software Technology, 2015, 66: 1-12.

[36]

Kuhn A, Ducasse S, Gîrba T. Semantic clustering: Identifying topics in source code. Information and Software Technology, 2007, 49(3): 230-243.

[37]

Zhang W, Cui Y, Yoshida T. En-LDA: An novel approach to automatic bug report assignment with entropy optimized latent Dirichlet allocation. Entropy, 2017, 19(5): Article No. 173.

[38]
Moin A, Neumann G. Assisting bug triage in large open source projects using approximate string matching. In Proc. the 7th International Conference on Software Engineering Advances, November 2012.
[39]
Murphy G, Cubranic D. Automatic bug triage using text categorization. In Proc. the 16th International Conference on Software Engineering and Knowledge Engineering, June 2004, pp.92-97.
[40]
Jeong G, Kim S, Zimmermann T. Improving bug triage with bug tossing graphs. In Proc. the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, August 2009, pp.111-120.
[41]
Jung Y, Kim J, Shin J, Yi K. Taming false alarms from a domain-unaware C analyzer by a Bayesian statistical post analysis. In Proc. the 12th International Conference on Static Analysis, September 2005, pp.203-217.
[42]

Yi K, Choi H, Kim J, Kim Y. An empirical study on classification methods for alarms from a bug-finding static C analyzer. Information Processing Letters, 2007, 102(2/3): 118-123.

[43]
Ruthruff J, Penix J, Morgenthaler J, Elbaum S, Rothermel G. Predicting accurate and actionable static analysis warnings: An experimental approach. In Proc. the 30th International Conference on Software Engineering, May 2008, pp.341-350.
[44]
Kremenek T, Engler D. Z-ranking: Using statistical analysis to counter the impact of static analysis approximations. In Proc. the 10th International Conference on Static Analysis, June 2003, pp.295-315.
[45]
Kremenek T, Ashcraft K, Yang J, Engler D. Correlation exploitation in error ranking. In Proc. the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering Notes, October 2004, pp.83-93.
[46]
Wohlin C, Runeson P, Höst M, Ohlsson M C, Regnell B, Wesslén A. Experimentation in Software Engineering. Springer Science & Business Media, 2012.
[47]
Griffiths T L, Steyvers M. Finding scientific topics. In Proc. National Academy of Sciences of the United States of America, April 2004, pp.5228-5235.
Journal of Computer Science and Technology
Pages 1461-1479
Cite this article:
Lee J-B, Lee T, In HP. Topic Modeling Based Warning Prioritization from Change Sets of Software Repository. Journal of Computer Science and Technology, 2020, 35(6): 1461-1479. https://doi.org/10.1007/s11390-020-0047-8

340

Views

3

Crossref

N/A

Web of Science

3

Scopus

0

CSCD

Altmetrics

Received: 19 September 2019
Revised: 01 May 2020
Published: 30 November 2020
©Institute of Computing Technology, Chinese Academy of Sciences 2020
Return