| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

An Empirical Study on Automated Test Generation Tools for Java: Effectiveness and Challenges

Xiang-Jun Liu, Ping Yu(), Xiao-Xing Ma

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China

Show Author Information

Abstract

Automated test generation tools enable test automation and further alleviate the low efficiency caused by writing hand-crafted test cases. However, existing automated tools are not mature enough to be widely used by software testing groups. This paper conducts an empirical study on the state-of-the-art automated tools for Java, i.e., EvoSuite, Randoop, JDoop, JTeXpert, T3, and Tardis. We design a test workflow to facilitate the process, which can automatically run tools for test generation, collect data, and evaluate various metrics. Furthermore, we conduct empirical analysis on these six tools and their related techniques from different aspects, i.e., code coverage, mutation score, test suite size, readability, and real fault detection ability. We discuss about the benefits and drawbacks of hybrid techniques based on experimental results. Besides, we introduce our experience in setting up and executing these tools, and summarize their usability and user-friendliness. Finally, we give some insights into automated tools in terms of test suite readability improvement, meaningful assertion generation, test suite reduction for random testing tools, and symbolic execution integration.

Keywords

automated test generation search-based software testing random testing symbolic execution

Electronic Supplementary Material

Video

JCST-2109-11935-Video.mp4

Download File(s)

JCST-2109-11935-Highlights.pdf (138.5 KB)

References

[1]

Anand S, Burke E K, Chen T Y, Clark J, Cohen M B, Grieskamp W, Harman M, Harrold M J, McMinn P. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software , 2013, 86(8): 1978–2001. DOI: 10.1016/j.jss.2013.02.061.

Crossref Google Scholar

[2]

Chen J J, Bai Y W, Hao D, Zhang L M, Zhang L, Xie B. How do assertions impact coverage-based test-suite reduction? In Proc. the 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), Mar. 2017, pp.418–423. DOI: 10.1109/ICST.2017.45.

[3]

Fraser G, Arcuri A. EvoSuite: Automatic test suite generation for object-oriented software. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Sept. 2011, pp.416–419. DOI: 10.1145/2025113.2025179.

[4]

Pacheco C, Lahiri S K, Ernst M D, Ball T. Feedback-directed random test generation. In Proc. the 29th International Conference on Software Engineering (ICSE’07), May 2007, pp.75–84. DOI: 10.1109/ICSE.2007.37.

[5]

Dimjašević M, Rakamarić Z. JPF-Doop: Combining concolic and random testing for Java. Collections, 2013, 422(3894): 58470. https://dimjasevic.net/marko/2013/11/17/presented-jpf-doop-at-java-pathfinder-workshop-2013/jpf-workshop-2013.pdf, Mar. 2024.

[6]

Sakti A, Pesant G, Guéhéneuc Y G. Instance generator and problem representation to improve object oriented code coverage. IEEE Trans. Software Engineering , 2015, 41(3): 294–313. DOI: 10.1109/TSE.2014.2363479.

Crossref Google Scholar

[7]

Prasetya I S W B. T3, a combinator-based random testing tool for Java: Benchmarking. In Proc. the 1st International Workshop on Future Internet Testing, Nov. 2013, pp.101–110. DOI: 10.1007/978-3-319-07785-7_7.

[8]

Braione P, Denaro G, Mattavelli A, Pezzè M. Combining symbolic execution and search-based testing for programs with complex heap inputs. In Proc. the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Jul. 2017, pp.90–101. DOI: 10.1145/3092703.3092715.

[9]

Panichella A, Kifetew F M, Tonella P. A large scale empirical comparison of state-of-the-art search-based test case generators. Information and Software Technology , 2018, 104: 236–256. DOI: 10.1016/j.infsof.2018.08.009.

Crossref Google Scholar

[10]

Baresi L, Lanzi P L, Miraz M. TestFul: An evolutionary test approach for Java. In Proc. the 3rd International Conference on Software Testing, Verification and Validation, Apr. 2010, pp.185–194. DOI: 10.1109/ICST.2010.54.

[11]

Pacheco C, Ernst M D. Randoop: Feedback-directed random testing for Java. In Proc. the Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion, Oct. 2007, pp.815–816. DOI: 10.1145/1297846.1297902.

[12]

Csallner C, Smaragdakis Y. JCrasher: An automatic robustness tester for Java. Software: Practice and Experience , 2004, 34(11): 1025–1050. DOI: 10.1002/spe.602.

Crossref Google Scholar

[13]

King J C. Symbolic execution and program testing. Communications of the ACM , 1976, 19(7): 385–394. DOI: 10.1145/360248.360252.

Crossref Google Scholar

[14]

Păsăreanu C S, Rungta N. Symbolic PathFinder: Symbolic execution of Java bytecode. In Proc. the 25th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.179–180. DOI: 10.1145/1858996.1859035.

[15]

Mues M, Howar F. JDart: Dynamic symbolic execution for Java bytecode (competition contribution). In Proc. the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Apr. 2020, pp.398–402. DOI: 10.1007/978-3-030-45237-7_28.

[16]

Li W B, Le Gall F, Spaseski N. A survey on model-based testing tools for test case generation. In Proc. the 4th International Conference on Tools and Methods for Program Analysis, Mar. 2017, pp.77–89. DOI: 10.1007/978-3-319-71734-0_7.

[17]

Dranidis D, Bratanis K, Ipate F. JSXM: A tool for automated test generation. In Proc. the 10th International Conference on Software Engineering and Formal Methods, Oct. 2012, pp.352–366. DOI: 10.1007/978-3-642-33826-7_25.

[18]

Lakhotia K, Harman M, McMinn P. Handling dynamic data structures in search based testing. In Proc. the 10th Annual Conference on Genetic and Evolutionary Computation, Jul. 2008, pp.1759–1766. DOI: 10.1145/1389095.1389435.

[19]

Sen K. Concolic testing. In Proc. the 22nd IEEE/ACM International Conference on Automated Software Engineering, Nov. 2007, pp.571–572. DOI: 10.1145/1321631.1321746.

[20]

Braione P, Denaro G. SUSHI and TARDIS at the SBST2019 tool competition. In Proc. the 12th IEEE/ACM International Workshop on Search-Based Software Testing (SBST), May 2019, pp.25–28. DOI: 10.1109/SBST.2019.00016.

[21]

Chitirala S C R. Comparing the effectiveness of automated test generation tools “EVOSUITE” and “Tpalus” [Master’s Thesis]. University of Minnesota, Minnesota, 2015.

[22]

Ma L, Artho C, Zhang C, Sato H, Gmeiner J, Ramler R. GRT: Program-analysis-guided random testing (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov. 2015, pp.212–223. DOI: 10.1109/ASE.2015.49.

[23]

Zafar M N, Afzal W, Enoiu E, Stratis A, Arrieta A, Sagardui G. Model-based testing in practice: An industrial case study using graphWalker. In Proc. the 14th Innovations in Software Engineering Conference (Formerly Known as India Software Engineering Conference), Feb. 2021, Article No. 5. DOI: 10.1145/3452383.3452388.

[24]

Braione P, Denaro G, Pezzè M. JBSE: A symbolic executor for Java programs with complex heap inputs. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2016, pp.1018–1022. DOI: 10.1145/2950290.2983940.

[25]

Grano G, De Iaco C, Palomba F, Gall H C. Pizza versus Pinsa: On the perception and measurability of unit test code quality. In Proc. the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept. 28–Oct. 2, 2020, pp.336–347. DOI: 10.1109/ICSME46990.2020.00040.

[26]

Hemmati H. How effective are code coverage criteria? In Proc. the 2015 IEEE International Conference on Software Quality, Reliability and Security, Aug. 2015, pp.151–156. DOI: 10.1109/QRS.2015.30.

[27]

Papadakis M, Kintis M, Zhang J, Jia Y, Le Traon Y, Harman M. Mutation testing advances: An analysis and survey. Advances in Computers , 2019, 112: 275–378. DOI: 10.1016/bs.adcom.2018.03.015.

Crossref Google Scholar

[28]

Winkler D, Urbanke P, Ramler R. What do we know about readability of test code?—A systematic mapping study. In Proc. the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Mar. 2022, pp.1167–1174. DOI: 10.1109/SANER53432.2022.00135.

[29]

Buse R P L, Weimer W R. Learning a metric for code readability. IEEE Trans. Software Engineering , 2010, 36(4): 546–558. DOI: 10.1109/TSE.2009.70.

Crossref Google Scholar

[30]

Aggarwal K K, Singh Y, Chhabra J K. An integrated measure of software maintainability. In Proc. the Annual Reliability and Maintainability Symposium (Cat. No. 02CH37318), Jan. 2002, pp.235–241. DOI: 10.1109/RAMS.2002.981648.

[31]

Börstler J, Caspersen M E, Nordström M. Beauty and the beast: On the readability of object-oriented example programs. Software Quality Journal , 2016, 24(2): 231–246. DOI: 10.1007/s11219-015-9267-5.

Crossref Google Scholar

[32]

Kannavara R, Havlicek C J, Chen B, Tuttle M R, Cong K, Ray S, Xie F. Challenges and opportunities with concolic testing. In Proc. the 2015 National Aerospace and Electronics Conference (NAECON), Jun. 2015, pp.374–378. DOI: 10.1109/NAECON.2015.7443099.

[33]

Qu X, Robinson B. A case study of concolic testing tools and their limitations. In Proc. the 2011 International Symposium on Empirical Software Engineering and Measurement, Sept. 2011, pp.117–126. DOI: 10.1109/ESEM.2011.20.

[34]

Galeotti J P, Fraser G, Arcuri A. Improving search-based test suite generation with dynamic symbolic execution. In Proc. the 24th IEEE International Symposium on Software Reliability Engineering (ISSRE), Nov. 2013, pp.360–369. DOI: 10.1109/ISSRE.2013.6698889.

[35]

Almasi M M, Hemmati H, Fraser G, Arcuri A, Benefelds J. An industrial evaluation of unit test generation: Finding real faults in a financial application. In Proc. the 39th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), May 2017, pp.263–272. DOI: 10.1109/ICSE-SEIP.2017.27.

[36]

Daka E, Campos J, Fraser G, Dorn J, Weimer W. Modeling readability to improve unit tests. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug. 2015, pp.107–118. DOI: 10.1145/2786805.2786838.

[37]

Panichella S, Panichella A, Beller M, Zaidman A, Gall H C. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.547–558. DOI: 10.1145/2884781.2884847.

[38]

Roy D, Zhang Z Y, Ma M, Arnaoudova V, Panichella A, Panichella S, Gonzalez D, Mirakhorli M. DeepTC-Enhancer: Improving the readability of automatically generated tests. In Proc. the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), Dec. 2020, pp.287–298. DOI: 10.1145/3324884.3416622.

[39]

Zhang Y C, Mesbah A. Assertions are strongly correlated with test suite effectiveness. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug. 2015, pp.214–224. DOI: 10.1145/2786805.2786858.

[40]

Watson C, Tufano M, Moran K, Bavota G, Poshyvanyk D. On learning meaningful assert statements for unit test cases. In Proc. the 42nd ACM/IEEE International Conference on Software Engineering, Jun. 2020, pp.1398–1409. DOI: 10.1145/3377811.3380429.

[41]

Tufano M, Drain D, Svyatkovskiy A, Sundaresan N. Generating accurate assert statements for unit test cases using pretrained transformers. In Proc. the 3rd ACM/IEEE International Conference on Automation of Software Test, May 2022, pp.54–64. DOI: 10.1145/3524481.3527220.

[42]

Cheon Y, Leavens G T. A simple and practical approach to unit testing: The JML and JUnit way. In Proc. the 16th European Conference on Object-Oriented Programming, Jun. 2002, pp.231–255. DOI: 10.1007/3-540-47993-7_10.

[43]

Tillmann N, De Halleux J. Pex—White box test generation for .NET. In Proc. the 2nd International Conference on Tests and Proofs, Apr. 2008, pp.134–153. DOI: 10.1007/978-3-540-79124-9_10.

[44]

Daka E, Fraser G. A survey on unit testing practices and problems. In Proc. the 25th IEEE International Symposium on Software Reliability Engineering, Nov. 2014, pp.201–211. DOI: 10.1109/ISSRE.2014.11.

[45]

Jaygarl H, Lu K S, Chang C K. GenRed: A tool for generating and reducing object-oriented test cases. In Proc. the 34th IEEE Annual Computer Software and Applications Conference, Jul. 2010, pp.127–136. DOI: 10.1109/COMPSAC.2010.19.

[46]

Cruciani E, Miranda B, Verdecchia R, Bertolino A. Scalable approaches for test suite reduction. In Proc. the 41st IEEE/ACM International Conference on Software Engineering (ICSE), May 2019, pp.419–429. DOI: 10.1109/ICSE.2019.00055.

[47]

Chetouane N, Wotawa F, Felbinger H, Nica M. On using k-means clustering for test suite reduction. In Proc. the 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Oct. 2020, pp.380–385. DOI: 10.1109/ICSTW50294.2020.00068.

[48]

Mues M, Howar F. JDart: Portfolio solving, breadth-first search and SMT-Lib strings (competition contribution). In Proc. the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Mar. 27–Apr. 1, 2021, pp.448–452. DOI: 10.1007/978-3-030-72013-1_30.

[49]

Baluda M. EvoSE: Evolutionary symbolic execution. In Proc. the 6th International Workshop on Automating Test Case Design, Selection and Evaluation, Aug. 2015, pp.16–19. DOI: 10.1145/2804322.2804325.

[50]

Olsthoorn M, Van Deursen A, Panichella A. Generating highly-structured input data by combining search-based testing and grammar-based fuzzing. In Proc. the 35th IEEE/ACM International Conference on Automated Software Engineering, Dec. 2020, pp.1224–1228. DOI: 10.1145/3324884.3418930.

Journal of Computer Science and Technology

Volume 39 Issue 3,
May 2024

Pages 715-736

DOI: 10.1007/s11390-023-1935-5

Cite this article:

Liu X-J, Yu P, Ma X-X. An Empirical Study on Automated Test Generation Tools for Java: Effectiveness and Challenges. Journal of Computer Science and Technology, 2024, 39(3): 715-736. https://doi.org/10.1007/s11390-023-1935-5

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号