EvoGPT: Enhancing Test Suite Robustness via LLM-Based Generation and Genetic Optimization

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address the robustness issues of insufficient diversity and low fault detection rates in unit test suites, this paper proposes an LLM-driven genetic optimization framework for test generation. Methodologically: (1) it introduces a mutation score–based fitness function—a novel design; (2) it establishes a three-stage collaborative paradigm comprising temperature-sampling–driven behavioral diversity generation, coverage-guided assertion enhancement, and iterative generate-and-repair optimization; and (3) it synergistically integrates the generative capabilities of large language models with the evolutionary search of genetic algorithms. Empirical evaluation across multiple open-source Java projects demonstrates that the approach improves both code coverage and mutation score by an average of 10% over baselines, significantly outperforming pure LLM-based generation and traditional search-based testing methods. The results confirm enhanced fault revelation capability and improved generalization robustness of the generated test suites.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently emerged as promising tools for automated unit test generation. We introduce a hybrid framework called EvoGPT that integrates LLM-based test generation with evolutionary search techniques to create diverse, fault-revealing unit tests. Unit tests are initially generated with diverse temperature sampling to maximize behavioral and test suite diversity, followed by a generation-repair loop and coverage-guided assertion enhancement. The resulting test suites are evolved using genetic algorithms, guided by a fitness function prioritizing mutation score over traditional coverage metrics. This design emphasizes the primary objective of unit testing-fault detection. Evaluated on multiple open-source Java projects, EvoGPT achieves an average improvement of 10% in both code coverage and mutation score compared to LLMs and traditional search-based software testing baselines. These results demonstrate that combining LLM-driven diversity, targeted repair, and evolutionary optimization produces more effective and resilient test suites.

Problem

Research questions and friction points this paper is trying to address.

Enhancing test suite robustness via LLM-based generation

Combining genetic optimization with LLMs for fault detection

Improving code coverage and mutation scores in unit testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based test generation with evolutionary search

Diverse temperature sampling for test diversity

Genetic algorithms guided by mutation score fitness

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation