SWE-Tester: Training Open-Source LLMs for Issue Reproduction in Real-World Repositories

📅 2026-01-20

📈 Citations: 1

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work proposes SWE-Tester, a framework that systematically fine-tunes various open-source large language models (e.g., LLaMA, Mistral) to generate reproducible test cases from natural language issue descriptions, addressing the reproducibility and collaboration limitations imposed by reliance on closed-source models. Leveraging a high-quality training dataset of 41K samples, the approach is evaluated using the SWT-Bench Verified benchmark and enhanced with a compute-aware inference scaling strategy. The fine-tuned models achieve an absolute 10% improvement in reproduction success rate and a 21% gain in change coverage compared to prior methods. Performance consistently scales with model size, data volume, and inference compute, thereby advancing an open ecosystem for test-driven development and automated program repair.

Technology Category

Application Category

📝 Abstract

Software testing is crucial for ensuring the correctness and reliability of software systems. Automated generation of issue reproduction tests from natural language issue descriptions enhances developer productivity by simplifying root cause analysis, promotes test-driven development --"test first, write code later", and can be used for improving the effectiveness of automated issue resolution systems like coding agents. Existing methods proposed for this task predominantly rely on closed-source LLMs, with limited exploration of open models. To address this, we propose SWE-Tester -- a novel pipeline for training open-source LLMs to generate issue reproduction tests. First, we curate a high-quality training dataset of 41K instances from 2.6K open-source GitHub repositories and use it to train LLMs of varying sizes and families. The fine-tuned models achieve absolute improvements of up to 10\% in success rate and 21\% in change coverage on SWT-Bench Verified. Further analysis shows consistent improvements with increased inference-time compute, more data, and larger models. These results highlight the effectiveness of our framework for advancing open-source LLMs in this domain.

Problem

Research questions and friction points this paper is trying to address.

issue reproduction

open-source LLMs

software testing

test generation

automated testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

issue reproduction

open-source LLMs

automated testing