🤖 AI Summary
Existing diffusion-based large language models (dLLMs) face a trade-off between efficiency and quality in unit test generation (UTG): increasing the number of tokens generated per step significantly degrades test quality. This paper proposes DiffTester, the first framework to integrate abstract syntax tree (AST) analysis into dLLM-based UTG, enabling dynamic identification and reuse of repetitive structural patterns in test code—thereby adaptively extending per-step generation length without compromising quality. Our approach combines AST-driven pattern awareness, parallel sequence generation, and multi-language adaptation, and extends the TestEval benchmark to support evaluation across Python, Java, and C++. Experiments demonstrate that DiffTester achieves up to 2.3× speedup in test generation across multiple dLLMs and programming languages, while maintaining or improving branch and line coverage—validating its generalizability and practical utility.
📝 Abstract
Software development relies heavily on extensive unit testing, which makes the efficiency of automated Unit Test Generation (UTG) particularly important. However, most existing LLMs generate test cases one token at a time in each forward pass, which leads to inefficient UTG. Recently, diffusion LLMs (dLLMs) have emerged, offering promising parallel generation capabilities and showing strong potential for efficient UTG. Despite this advantage, their application to UTG is still constrained by a clear trade-off between efficiency and test quality, since increasing the number of tokens generated in each step often causes a sharp decline in the quality of test cases. To overcome this limitation, we present DiffTester, an acceleration framework specifically tailored for dLLMs in UTG. The key idea of DiffTester is that unit tests targeting the same focal method often share repetitive structural patterns. By dynamically identifying these common patterns through abstract syntax tree analysis during generation, DiffTester adaptively increases the number of tokens produced at each step without compromising the quality of the output. To enable comprehensive evaluation, we extend the original TestEval benchmark, which was limited to Python, by introducing additional programming languages including Java and C++. Extensive experiments on three benchmarks with two representative models show that DiffTester delivers significant acceleration while preserving test coverage. Moreover, DiffTester generalizes well across different dLLMs and programming languages, providing a practical and scalable solution for efficient UTG in software development. Code and data are publicly available at https://github.com/wellbeingyang/DLM4UTG-open .