🤖 AI Summary
Traditional unit test generation approaches are path-insensitive, limiting their ability to explore deep control-flow paths and resulting in insufficient coverage. This paper proposes JUnitGenie, a path-sensitive test generation framework that—uniquely—distills structured code knowledge (extracted via static analysis) into prompt signals to guide large language models (LLMs) in generating high-coverage test cases. It introduces context-aware prompting and path-directed generation strategies, synergistically integrating the precision of program analysis with the semantic reasoning capabilities of LLMs. Evaluated on ten real-world Java projects, JUnitGenie achieves average improvements of 29.60% in branch coverage and 31.00% in line coverage over state-of-the-art baselines. Moreover, it successfully identifies multiple previously unknown real-world defects.
📝 Abstract
Unit testing is essential for software quality assurance, yet writing and maintaining tests remains time-consuming and error-prone. To address this challenge, researchers have proposed various techniques for automating unit test generation, including traditional heuristic-based methods and more recent approaches that leverage large language models (LLMs). However, these existing approaches are inherently path-insensitive because they rely on fixed heuristics or limited contextual information and fail to reason about deep control-flow structures. As a result, they often struggle to achieve adequate coverage, particularly for deep or complex execution paths. In this work, we present a path-sensitive framework, JUnitGenie, to fill this gap by combining code knowledge with the semantic capabilities of LLMs in guiding context-aware unit test generation. After extracting code knowledge from Java projects, JUnitGenie distills this knowledge into structured prompts to guide the generation of high-coverage unit tests. We evaluate JUnitGenie on 2,258 complex focal methods from ten real-world Java projects. The results show that JUnitGenie generates valid tests and improves branch and line coverage by 29.60% and 31.00% on average over both heuristic and LLM-based baselines. We further demonstrate that the generated test cases can uncover real-world bugs, which were later confirmed and fixed by developers.