Improving the Readability of Automatically Generated Tests using Large Language Models

📅 2024-12-25

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

Search-based test generators (e.g., EvoSuite) achieve high code coverage but produce tests with poor naming conventions and low readability; conversely, LLM-generated tests exhibit strong natural-language fluency yet suffer from insufficient structural coverage. Method: This paper proposes a semantics-preserving readability enhancement paradigm that synergistically integrates the high-coverage capability of search-based generation with the expressive power of large language models (LLMs). Leveraging nine industrial and open-source LLMs—including CodeLlama and GPT-series models—we apply lightweight, prompt-driven renaming of variables and methods in search-generated test cases, strictly preserving semantic behavior and code coverage. Contribution/Results: Empirical evaluation confirms semantic stability throughout renaming. A blind evaluation by ten professional developers shows no statistically significant difference in readability between our optimized tests and manually written ones, demonstrating that automated readability enhancement can match human-level clarity without compromising coverage or correctness.

Technology Category

Application Category

📝 Abstract

Search-based test generators are effective at producing unit tests with high coverage. However, such automatically generated tests have no meaningful test and variable names, making them hard to understand and interpret by developers. On the other hand, large language models (LLMs) can generate highly readable test cases, but they are not able to match the effectiveness of search-based generators, in terms of achieved code coverage. In this paper, we propose to combine the effectiveness of search-based generators with the readability of LLM generated tests. Our approach focuses on improving test and variable names produced by search-based tools, while keeping their semantics (i.e., their coverage) unchanged. Our evaluation on nine industrial and open source LLMs show that our readability improvement transformations are overall semantically-preserving and stable across multiple repetitions. Moreover, a human study with ten professional developers, show that our LLM-improved tests are as readable as developer-written tests, regardless of the LLM employed.

Problem

Research questions and friction points this paper is trying to address.

Automatic Test Code Generation

Coverage

Comprehensibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized Test Code Naming

Large Language Model Integration

Enhanced Readability and Coverage

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation