🤖 AI Summary
This work addresses the challenge of software defect root cause localization, which is often hindered by the absence of reproducible test cases in bug reports and the high cost of manual test case creation. To overcome this, we propose Echo, an intelligent agent that uniquely integrates automated test execution with a patch validation feedback loop. Echo leverages a code knowledge graph to enhance contextual retrieval, employs automated query refinement, performs just-in-time execution validation, and applies a fail-to-pass criterion to efficiently generate high-quality single-test cases for defect reproduction. Evaluated on the SWT-Bench Verified benchmark, Echo achieves a state-of-the-art success rate of 66.28% among open-source methods, substantially improving both reproducibility efficiency and cost-effectiveness.
📝 Abstract
Identifying the root cause of a bug remains difficult for many developers because bug reports often lack a bug reproducing test case that reliably triggers the failure. Manually writing such test cases is time-consuming and requires substantial effort to understand the codebase and isolate the failing behavior. To address this challenge, we propose Echo, an agent for generating issue reproducing test cases, which advances previous work in several ways. During generation, Echo strengthens context retrieval by leveraging a code graph and a novel automatic query-refinement strategy. Echo also improves upon previous tools by automatically executing generated test cases, a first-of-its-kind feature that seamlessly integrates into practical development workflows. In addition, Echo generates potential patches and uses the patched version to validate whether a candidate test meets the fail-to-pass criterion and to provide actionable feedback for refinement. Unlike prior bug-reproduction agents that sample and rank multiple candidate tests, Echo generates a single test per issue, offering a better cost-performance trade-off. Experiments on SWT-Bench Verified show that Echo establishes a new state of the art among open-source approaches, achieving a 66.28% success rate.