🤖 AI Summary
To address the robustness issues of insufficient diversity and low fault detection rates in unit test suites, this paper proposes an LLM-driven genetic optimization framework for test generation. Methodologically: (1) it introduces a mutation score–based fitness function—a novel design; (2) it establishes a three-stage collaborative paradigm comprising temperature-sampling–driven behavioral diversity generation, coverage-guided assertion enhancement, and iterative generate-and-repair optimization; and (3) it synergistically integrates the generative capabilities of large language models with the evolutionary search of genetic algorithms. Empirical evaluation across multiple open-source Java projects demonstrates that the approach improves both code coverage and mutation score by an average of 10% over baselines, significantly outperforming pure LLM-based generation and traditional search-based testing methods. The results confirm enhanced fault revelation capability and improved generalization robustness of the generated test suites.
📝 Abstract
Large Language Models (LLMs) have recently emerged as promising tools for automated unit test generation. We introduce a hybrid framework called EvoGPT that integrates LLM-based test generation with evolutionary search techniques to create diverse, fault-revealing unit tests. Unit tests are initially generated with diverse temperature sampling to maximize behavioral and test suite diversity, followed by a generation-repair loop and coverage-guided assertion enhancement. The resulting test suites are evolved using genetic algorithms, guided by a fitness function prioritizing mutation score over traditional coverage metrics. This design emphasizes the primary objective of unit testing-fault detection. Evaluated on multiple open-source Java projects, EvoGPT achieves an average improvement of 10% in both code coverage and mutation score compared to LLMs and traditional search-based software testing baselines. These results demonstrate that combining LLM-driven diversity, targeted repair, and evolutionary optimization produces more effective and resilient test suites.