🤖 AI Summary
Existing automated test generation methods face dual limitations—low readability in search-based approaches and poor cost-efficiency in single-shot LLM-based generation.
Method: This paper proposes the first feedback-driven, agent-based iterative test generation framework. It begins with zero-shot LLM-generated initial tests and refines them through closed-loop iteration, incorporating execution feedback and coverage signals. Automated orchestration is achieved via an agentic workflow built on the OpenHands platform.
Contribution/Results: The core innovation lies in reframing LLM-based test generation as an execution–evaluation–optimization closed loop, balancing readability, effectiveness, and low cost. Evaluated on TestGenEval, our method achieves 84.3% pass@1, 44.4% line coverage, and 33.8% mutation score, with a per-file generation cost of only $0.63—significantly outperforming both search-based and single-shot LLM baselines.
📝 Abstract
Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice. We present TestForge, an agentic unit testing framework designed to cost-effectively generate high-quality test suites for real-world code. Our key insight is to reframe LLM-based test generation as an iterative process. TestForge thus begins with tests generated via zero-shot prompting, and then continuously refines those tests based on feedback from test executions and coverage reports. We evaluate TestForge on TestGenEval, a real world unit test generation benchmark sourced from 11 large scale open source repositories; we show that TestForge achieves a pass@1 rate of 84.3%, 44.4% line coverage and 33.8% mutation score on average, outperforming prior classical approaches and a one-iteration LLM-based baseline. TestForge produces more natural and understandable tests compared to state-of-the-art search-based techniques, and offers substantial cost savings over LLM-based techniques (at $0.63 per file). Finally, we release a version of TestGenEval integrated with the OpenHands platform, a popular open-source framework featuring a diverse set of software engineering agents and agentic benchmarks, for future extension and development.