SBFT Tool Competition 2025 -- Java Test Case Generation Track

📅 2025-04-12

📈 Citations: 0

✨ Influential: 0

career value

137K/year

🤖 AI Summary

Existing evaluations of Java test-generation tools rely on narrow, single-dimensional metrics, limiting insights into their practical utility. Method: This study systematically assesses EVOFUZZ, EVOSUITE, BBC, and RANDOOP across 55 real-world cross-project classes, introducing the first three-dimensional evaluation framework integrating code coverage, mutation coverage, and natural-language-inspired readability—quantified via syntactic complexity and identifier naming quality. A standardized benchmark suite is constructed, and evaluation combines static analysis, dynamic execution, and readability modeling. Contribution/Results: Empirical results reveal systematic trade-offs between coverage and maintainability: EVOFUZZ achieves +12.3% average mutation coverage over baselines, while RANDOOP produces the most readable test cases. The framework provides actionable, evidence-based guidance for industrial test-tool selection and advances methodology for holistic test-generation assessment.

Technology Category

Application Category

📝 Abstract

This short report presents the 2025 edition of the Java Unit Testing Competition in which four test generation tools (EVOFUZZ, EVOSUITE, BBC, and RANDOOP) were benchmarked on a freshly selected set of 55 Java classes from six different open source projects. The benchmarking was based on structural metrics, such as code and mutation coverage of the classes under test, as well as on the readability of the generated test cases.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking Java test generation tools on 55 classes

Evaluating code and mutation coverage metrics

Assessing readability of generated test cases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarked four Java test generation tools

Used structural metrics for evaluation

Assessed readability of generated test cases

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation