Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

๐Ÿ“… 2026-04-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

206K/year
๐Ÿค– AI Summary
This work addresses the limitation of existing large language model (LLM)-based test generation methods, which often rely on greedy strategies and struggle to cover deep code branches requiring multi-step setup. The paper introduces CovQValue, the first approach to integrate Bayesian exploration into LLM-driven test generation. It treats the programโ€™s branch structure as an unknown environment and leverages coverage maps from evolutionary runs as a surrogate posterior. By prompting the LLM to generate diverse exploration plans in parallel and selecting the most informative paths based on LLM-estimated Q-values, CovQValue balances immediate bug discovery with long-term reachability. Experiments demonstrate that this method improves branch coverage by 51โ€“77% on TestGenEval Lite, achieving win rates of 77โ€“84%, and attains 40โ€“74% coverage on the newly introduced RepoExploreBench, significantly outperforming baseline approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
The use of LLMs for code generation has naturally extended to code testing and evaluation. As codebases grow in size and complexity, so does the need for automated test generation. Current approaches for LLM-based test generation rely on strategies that maximize immediate coverage gain, a greedy approach that plateaus on code where reaching deep branches requires setup steps that individually yield zero new coverage. Drawing on principles of Bayesian exploration, we treat the program's branch structure as an unknown environment, and an evolving coverage map as a proxy probabilistic posterior representing what the LLM has discovered so far. Our method, CovQValue, feeds the coverage map back to the LLM, generates diverse candidate plans in parallel, and selects the most informative plan by LLM-estimated Q-values, seeking actions that balance immediate branch discovery with future reachability. Our method outperforms greedy selection on TestGenEval Lite, achieving 51-77% higher branch coverage across three popular LLMs and winning on 77-84% of targets. In addition, we build a benchmark for iterative test generation, RepoExploreBench, where they achieve 40-74%. These results show the potential of curiosity-driven planning methods for LLM-based exploration, enabling more effective discovery of program behavior through sequential interaction
Problem

Research questions and friction points this paper is trying to address.

LLM test generation
branch coverage
greedy strategy
code exploration
automated testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

curiosity-driven planning
Bayesian exploration
LLM-based test generation
coverage map
Q-value
๐Ÿ”Ž Similar Papers
No similar papers found.