Reproduction Test Generation for Java SWE Issues

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

161K/year
📝 Abstract
Given an issue on a software repository, a reproduction test confirms its presence in the code before it gets fixed and its absence after. Reproduction tests provide crucial execution-based feedback for diagnosis and validation during software development. Unfortunately, they are usually missing. Therefore, recent work has introduced both benchmarks and a thriving literature on solutions for reproduction test generation from issues. However, that work has focused on Python and neglected other languages such as Java, which is important for enterprise software. This paper introduces both a benchmark and a solution for Java repository-level reproduction test generation. The benchmark, TDD-Bench-Java, is the first to model this problem and comprises 250 instances sourced from popular open-source repositories. The solution, e-Otter++ for Java, adapts a state-of-the-art reproduction test generator for Python to yield high performance on Java. To evaluate in an industry setting, besides empirical results with TDD-Bench-Java, this paper also presents results with a contamination-free proprietary dataset. Overall, we hope that this paper contributes to bringing better diagnosis and validation to Java software development.
Problem

Research questions and friction points this paper is trying to address.

reproduction test generation
Java
software issue
test automation
bug validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

reproduction test generation
Java
TDD-Bench-Java
e-Otter++
software debugging