Clarifying Semantics of In-Context Examples for Unit Test Generation

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

129K/year

🤖 AI Summary

Large language models (LLMs) suffer from low-quality code generation in in-context learning due to semantically ambiguous unit test examples. Method: This paper proposes CLAST, a test semantic enhancement technique that integrates static program analysis with LLM-driven rewriting to systematically optimize the semantic expressiveness of test cases—while preserving 100% original test validity—through logical decomposition and readability improvement. Contribution/Results: Evaluated on four open-source and three industrial projects, CLAST achieves average improvements of 45.99% in code coverage, 28.22% in test pass rate (PR), and 25.97% in correct synthesis rate (CSR) over UTgen. Moreover, 85.33% of developers prefer CLAST-generated test cases for their semantic clarity and structural rigor.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have enabled promising performance in unit test generation through in-context learning (ICL). However, the quality of in-context examples significantly influences the effectiveness of generated tests-poorly structured or semantically unclear test examples often lead to suboptimal outputs. In this paper, we propose CLAST, a novel technique that systematically refines unit tests to improve their semantic clarity, thereby enhancing their utility as in-context examples. The approach decomposes complex tests into logically clearer ones and improves semantic clarity through a combination of program analysis and LLM-based rewriting. We evaluated CLAST on four open-source and three industrial projects. The results demonstrate that CLAST largely outperforms UTgen, the state-of-the-art refinement technique, in both preserving test effectiveness and enhancing semantic clarity. Specifically, CLAST fully retains the original effectiveness of unit tests, while UTgen reduces compilation success rate (CSR), pass rate (PR), test coverage (Cov), and mutation score (MS) by an average of 12.90%, 35.82%, 4.65%, and 5.07%, respectively. Over 85.33% of participants in our user study preferred the semantic clarity of CLAST-refined tests. Notably, incorporating CLAST-refined tests as examples effectively improves ICL-based unit test generation approaches such as RAGGen and TELPA, resulting in an average increase of 25.97% in CSR, 28.22% in PR, and 45.99% in Cov for generated tests, compared to incorporating UTgen-refined tests. The insights from the follow-up user study not only reinforce CLAST's potential impact in software testing practice but also illuminate avenues for future research.

Problem

Research questions and friction points this paper is trying to address.

Improving semantic clarity of in-context examples for unit test generation

Refining poorly structured tests to enhance LLM-based test generation

Addressing suboptimal outputs from semantically unclear test examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes complex tests into logically clearer ones

Improves semantic clarity via program analysis and LLM rewriting

Enhances in-context examples for unit test generation

🔎 Similar Papers

Enhancing Large Language Models for Text-to-Testcase Generation