Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Mohammad Y. Mhawish; Manjari Gupta

doi:10.1007/s11390-020-0323-7

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Mohammad Y. Mhawish, Manjari Gupta

Computer Science, Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University Varanasi 221005, India

Show Author Information

Abstract

Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.

Keywords

feature selection parameter optimization code smell code smell detection prediction explanation

Electronic Supplementary Material

Download File(s)

jcst-35-6-1428-Highlights.pdf (124.1 KB)

References

[1]

Wiegers K, Beatty J. Software Reqirements. Pearson Education, 2013.

[2]

Chung L, do Prado Leite J C S. On non-functional requirements in software engineering. In Conceptual Modeling: Foundations and Applications-Essays in Honor of John Mylopoulos, Borgida AT, Chaudhri V, Giorgini P, Yu E (eds.), Springer, 2009, pp.363-379.

[3]

Fowler M, Beck K, Brant J, Opdyke W, Roberts D. Refactoring: Improving the Design of Existing Code (1st edition). Addison-Wesley Professional, 1999.

[4]

Yamashita A, Moonen L. Exploring the impact of inter-smell relations on software maintainability: An empirical study. In Proc. the 35th Int. Conf. Softw. Eng., May 2013, pp.682-691.

Crossref

[5]

Yamashita A, Counsell S. Code smells as system-level indicators of maintainability: An empirical study. J. Syst. Softw., 2013, 86(10): 2639-2653.

Crossref Google Scholar

[6]

Yamashita A, Moonen L. Do code smells reflect important maintainability aspects? In Proc. the 28th IEEE Int. Conf. Softw. Maintenance, September 2012, pp.306-315.

Crossref

[7]

Sjøberg D I K, Yamashita A, Anda B C D, Mockus A, Dybå T. Quantifying the effect of code smells on maintenance effort. IEEE Trans. Softw. Eng., 2013, 39(8): 1144-1156.

Crossref Google Scholar

[8]

Sahin D, Kessentini M, Bechikh S, Ded K. Code-smells detection as a bi-level problem. ACM Trans. Softw. Eng. Methodol., 2014, 24(1): Article No. 6.

Crossref Google Scholar

[9]

Olbrich S, Cruzes D S, Basili V, Zazworka N. The evolution and impact of code smells: A case study of two open source systems. In Proc. the 3rd International Symposium on Empirical Software Engineering and Measurement, October 2009, pp.390-400.

Crossref

[10]

Olbrich SM, Cruzes D S, Sjoøberg D I K. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In Proc. the 26th IEEE Int. Conf. Softw. Maintenance, September 2010.

Crossref

[11]

Khomh F, Penta D M, Guéhéneuc Y G. An exploratory study of the impact of code smells on software change-proneness. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.75-84.

Crossref

[12]

Deligiannis I, Stamelos I, Angelis L, Roumeliotis M, Shepperd M. A controlled experiment investigation of an object-oriented design heuristic for maintainability. J. Syst. Softw., 2004, 72(2): 129-143.

Crossref Google Scholar

[13]

Pérez-Castillo R, Piattini M. Analyzing the harmful effect of god class refactoring on power consumption. IEEE Softw., 2014, 31(3): 48-54.

Crossref Google Scholar

[14]

Li W, Shatnawi R. An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J. Syst. Softw., 2007, 80(7): 1120-1128.

Crossref Google Scholar

[15]

Ciupke O. Automatic detection of design problems in object-oriented reengineering. In Proc. the 30th International Conference on Technology of Object-Oriented Languages and Systems, Delivering Quality Software, August 1999, pp.18-32.

[16]

Travassos G, Shull F, Fredericks M, Basili V R. Detecting defects in object-oriented designs: Using reading techniques to increase software quality. ACM SIGPLAN Notices, 1999, 34(10): 47-56.

Crossref Google Scholar

[17]

Dashofy E M, van der Hoek A, Taylor R N. A comprehensive approach for the development of modular software architecture description languages. ACM Trans. Softw. Eng. Methodol., 2005, 14(2): 199-245.

Crossref Google Scholar

[18]

Vidal S, Vázquez H, Díaz-Pace J A, Marcos C, Garcia A, Oizumi W. JSpIRIT: A flexible tool for the analysis of code smells. In Proc. the 34th Int. Conf. Chil. Comput. Sci. Soc., November 2016.

Crossref

[19]

Marinescu R. Measurement and quality in object-oriented design. In Proc. the 21st IEEE Int. Conf. Softw. Maintenance, September 2005, pp.701-704.

Crossref

[20]

Moha N, Guéhéneuc Y, Duchien L, le Meur A. DECOR: A method for the specification and detection of code and design smells. IEEE Trans. Softw. Eng., 2010, 36(1): 20-36.

Crossref Google Scholar

[21]

Fontana F A, Zanoni M, Marino A, Mäntylä M V. Code smell detection: Towards a machine learning-based approach. In Proc. the 2013 IEEE Int. Conf. Softw. Maintenance, September 2013, pp.396-399.

Crossref

[22]

Azadi U, Fontana F A, Zanoni M. Machine learning based code smell detection through WekaNose. In Proc. the 40th Int. Conf. Softw. Eng., May 2018, pp.288-289.

Crossref

[23]

Fontana F A, Zanoni M. Code smell severity classification using machine learning techniques. Knowledge-Based Syst., 2017, 128: 43-58.

Crossref Google Scholar

[24]

Fontana F A, Mäntylä M V, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir. Softw. Eng., 2016, 21(3): 1143-1191.

Crossref Google Scholar

[25]

Sharma T, Spinellis D. A survey on software smells. J. Syst. Softw., 2018, 138: 158-173.

Crossref Google Scholar

[26]

Rasool G, Arshad Z. A review of code smell mining techniques. J. Softw. Evol. Process, 2015, 27(11): 867-895.

Crossref Google Scholar

[27]

Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E. A review-based comparative study of bad smell detection tools. In Proc. the 20th International Conference on Evaluation and Assessment in Software Engineering, June 2016, Article No. 18.

Crossref

[28]

Fontana F A, Braione P, Zanoni M. Automatic detection of bad smells in code: An experimental assessment. J. Object Technol., 2012, 11(2): Article No. 5.

Crossref Google Scholar

[29]

Riberro M T, Singh S, Guestrin C. “Why should I trust you?”: Explaining the predictions of and classifier. https//arxiv.org/abs/1602.04938, Oct. 2020.

[30]

Chicco D. Ten quick tips for machine learning in computational biology. BioData Mining, 2017, 10(1): 35.

Crossref Google Scholar

[31]

Marinescu R. Detection strategies: Metrics-based rules for detecting design flaws. In Proc. the 20th IEEE International Conference on Software Maintenance, December 2004, pp.350-359.

[32]

Abílio R, Padilha J, Figueiredo E, Costa H. Detecting code smells in software product lines — An exploratory study. In Proc. the 12th International Conference on Information Technology-New Generations, April 2015, pp.433-438.

Crossref

[33]

Fenske W, Schulze S. Code smells revisited: A variability perspective. In Proc. the 9th International Workshop on Variability Modelling of Software-Intensive Systems, January 2015, Article No. 3.

Crossref

[34]

Suryanarayana G, Samarthyam G, Sharma T. Refactoring for Software Design Smells: Managing Technical Debt (1st edition). Morgan Kaufmann, 2014.

Crossref

[35]

Baudry B, Traon Y L, Sunyé G, Jézéquel J M. Measuring and improving design patterns testability. In Proc. the 9th IEEE International Software Metrics Symposium, September 2003.

[36]

Langelier G, Sahraoui H, Poulin P. Visualization-based analysis of quality for large-scale software systems. In Proc. the 20th IEEE/ACM International Conference on Automated Software Engineering, November 2005, pp.214-223.

Crossref

[37]

Murphy-Hill E, Black A P. An interactive ambient visualization for code smells. In Proc. the 5th International Symposium on Software Visualization, October 2010, pp.5-14.

Crossref

[38]

de Figueiredo Carneiro G, Silva M, Mara L et al. Identifying code smells with multiple concern views. In Proc. the 24th Brazilian Symposium on Software Engineering, September 2010, pp.128-137.

Crossref

[39]

Kreimer J. Adaptive detection of design flaws. Electron. Notes Theor. Comput. Sci., 2005, 141(4): 117-136.

Crossref Google Scholar

[40]

Amorim L, Costa E, Antunes N, Fonseca B, Ribeiro M. Experience report: Evaluating the effectiveness of decision trees for detecting code smells. In Proc. the 26th IEEE International Symposium on Software Reliability Engineering, November 2015, pp.261-269.

Crossref

[41]

Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H. A Bayesian approach for the detection of code and design smells. In Proc. the 9th International Conference on Quality Software, August 2009, pp.305-314.

Crossref

[42]

Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H. BDTEX: A GQM-based Bayesian approach for the detection of antipatterns. J. Syst. Softw., 2011, 84(4): 559-572.

Crossref Google Scholar

[43]

Vaucher S, Khomh F, Moha N, Guéhéneuc Y G. Tracking design smells: Lessons from a study of god classes. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.145-154.

Crossref

[44]

Hassaine S, Khomh F, Guéhéneuc Y G, Hamel S. IDS: An immune-inspired approach for the detection of software design smells. In Proc. the 7th International Conference on the Quality of Information and Communications Technology, September 2010, pp.343-348.

Crossref

[45]

Maiga A, Ali N, Bhattacharya N et al. Support vector machines for anti-pattern detection. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, September 2012, pp.278-281.

Crossref

[46]

Maiga A, Ali N, Bhattacharya N, Sabane A, Gueheneuc Y G, Aimeur E. SMURF: A SVM-based incremental anti-pattern detection approach. In Proc. the 19th Working Conference on Reverse Engineering, October 2012, pp.466-475.

Crossref

[47]

Tempero E, Anslow C, Dietrich J et al. The Qualitas Corpus: A curated collection of Java code for empirical studies. In Proc. the 17th Asia Pacific Software Engineering Conference, November 2010, pp.336-345.

Crossref

[48]

Pecorelli F, Palomba F, di Nucci D, de Lucia A. Comparing heuristic and machine learning approaches for metric-based code smell detection. In Proc. the 27th Int. Conf. Progr. Compr., May 2019, pp.93-104.

Crossref

[49]

Wieman R. Anti-Pattern Scanner: An approach to detect anti-patterns and design violations [Master Thesis]. Department of Computer Science, Delft University of Technology, 2011.

[50]

Nongpong K. Integrating “code smells” detection with refactoring tool support [Ph.D. Thesis]. University of Wisconsin-Milwaukee, 2012.

[51]

Riel A J. Object-Oriented Design Heuristics (1st edition). Addison-Wesley Professional, 1996.

[52]

Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, 16: 321-357.

Crossref Google Scholar

[53]

Do T D, Hui S C, Fong A C M. Associative classification with prediction confidence. In Proc. the 4th International Conference on Machine Learning and Cybernetics, August 2005, pp.199-208.

Crossref

[54]

Malhotra R. Empirical Research in Software Engineering: Concepts, Analysis, and Applications (1st edition). Chapman and Hall/CRC, 2015.

Crossref

[55]

Forman G, Scholz M, Rajaram S. Feature shaping for linear SVM classifiers. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.299-308.

Crossref

[56]

Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recognit., 2005, 38(12): 2270-2285.

Crossref Google Scholar

[57]

Yang J, Honavar V. Feature subset selection using a genetic algorithm. IEEE Intell. Syst., 1998, 13(2): 44-49.

Crossref Google Scholar

[58]

Cassar I R, Titus N D, Grill W M. An improved genetic algorithm for designing optimal temporal patterns of neural stimulation. J. Neural Eng., 2017, 14(6): Article No. 066013.

Crossref Google Scholar

[59]

Hassanat A, Almohammadi K, Alkafaween E, Abunawas E, Hammouri A, Prasath V B. Choosing mutation and crossover ratios for genetic algorithms — A review with a new dynamic approach. Information, 2019, 10(12): Article No. 390.

Crossref Google Scholar

[60]

Hall M A. Correlation-based feature subset selection for machine learning [Ph.D Thesis]. Department of Computer Science, The University of Waikato, 1998.

[61]

Vapnik V N. An overview of statistical learning theory. IEEE Trans. Neural Networks, 1999, 10(5): 988-999.

Crossref Google Scholar

[62]

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444.

Crossref Google Scholar

[63]

Aha D W, Kibler D, Albert M K. Instance-based learning algorithms. Mach. Learn., 1991, 6(1): 37-66.

Crossref Google Scholar

[64]

Rokach L, Maimon O Z. Data Mining with Decision Trees: Theory and Applications. World Scientific, 2007.

Crossref

[65]

Malohlava M, Candel A, Click C, Roark H, Parmar V. Gradient boosting machine with H2O. https://www.h-2o.ai/wp-content/uploads/2018/01/GBM-BOOKLET.pdf, May 2020.

[66]

Hsu C W, Chang C C, Lin C J. A practical guide to support vector classification. Technical Report, Taiwan University, 2008. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, May 2020.

[67]

Thomas I L, Allcock G M. Determining the confidence level for a classification. Photogramm. Eng. Remote Sensing, 1984, 50(10): 1491-1496.

Google Scholar

[68]

Chakraborty S, Tomsett R, Raghavendra R et al. Interpretability of deep learning models: A survey of results. In Proc. the 2017 IEEE SmartWorld Ubiquitous Intell. Comput. Adv. and Trust. Comput. Scalable Comput. and Commun. Cloud Big Data Comput., Internet People Smart City Innov. SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI, August 2017.

Crossref

[69]

Guggulothu T, Moiz S A. Code smell detection using multi-label classification approach. Softw. Qual. J., 2020, 28: 1063-1086.

Crossref Google Scholar

[70]

Kiyak E O, Birant D, Birant K U. Comparison of multilabel classification algorithms for code smell detection. In Proc. the 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, October 2019.

Crossref

[71]

di Nucci D, Palomba F, Tamburri D A, Serebrenik A, de Lucia A. Detecting code smells using machine learning techniques: Are we there yet? In Proc. the 25th IEEE Int. Conf. Softw. Anal. Evol. Reengineering, March 2018, pp.612-621.

Crossref

Journal of Computer Science and Technology

Volume 35 Issue 6,
November 2020

Pages 1428-1445

DOI: 10.1007/s11390-020-0323-7

Cite this article:

Mhawish MY, Gupta M. Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics. Journal of Computer Science and Technology, 2020, 35(6): 1428-1445. https://doi.org/10.1007/s11390-020-0323-7

373

Views

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 24 January 2020

Revised: 29 September 2020

Published: 30 November 2020