TestForge: Feedback-Driven, Agentic Test Suite Generation

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing automated test generation methods face dual limitations—low readability in search-based approaches and poor cost-efficiency in single-shot LLM-based generation. Method: This paper proposes the first feedback-driven, agent-based iterative test generation framework. It begins with zero-shot LLM-generated initial tests and refines them through closed-loop iteration, incorporating execution feedback and coverage signals. Automated orchestration is achieved via an agentic workflow built on the OpenHands platform. Contribution/Results: The core innovation lies in reframing LLM-based test generation as an execution–evaluation–optimization closed loop, balancing readability, effectiveness, and low cost. Evaluated on TestGenEval, our method achieves 84.3% pass@1, 44.4% line coverage, and 33.8% mutation score, with a per-file generation cost of only $0.63—significantly outperforming both search-based and single-shot LLM baselines.

Technology Category

Application Category

📝 Abstract

Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice. We present TestForge, an agentic unit testing framework designed to cost-effectively generate high-quality test suites for real-world code. Our key insight is to reframe LLM-based test generation as an iterative process. TestForge thus begins with tests generated via zero-shot prompting, and then continuously refines those tests based on feedback from test executions and coverage reports. We evaluate TestForge on TestGenEval, a real world unit test generation benchmark sourced from 11 large scale open source repositories; we show that TestForge achieves a pass@1 rate of 84.3%, 44.4% line coverage and 33.8% mutation score on average, outperforming prior classical approaches and a one-iteration LLM-based baseline. TestForge produces more natural and understandable tests compared to state-of-the-art search-based techniques, and offers substantial cost savings over LLM-based techniques (at $0.63 per file). Finally, we release a version of TestGenEval integrated with the OpenHands platform, a popular open-source framework featuring a diverse set of software engineering agents and agentic benchmarks, for future extension and development.

Problem

Research questions and friction points this paper is trying to address.

Automated test generation compromises readability and cost.

TestForge reframes LLM-based test generation as iterative process.

TestForge outperforms classical and LLM-based approaches in quality and cost.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative LLM-based test generation process

Feedback-driven refinement using execution results

Cost-effective high-quality test suite generation

🔎 Similar Papers

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark