๐ค AI Summary
Research on large language models (LLMs) for unit testing is fragmented, lacking a systematic, up-to-date synthesis. Method: We conduct a systematic literature review (SLR) covering studies published through March 2025, integrating perspectives from both software engineering and foundation model research. We systematically code and analyze LLM usage patterns, adaptation strategies (e.g., prompting, fine-tuning), and hybrid approaches across core tasksโincluding test case generation, oracle generation, and test repair. Contribution/Results: We propose the first comprehensive taxonomy of LLM-enabled unit testing, identifying critical challenges: poor reproducibility, absence of standardized evaluation metrics, and insufficient domain adaptation. We outline future directions, including enhanced interpretability and human-in-the-loop validation. Our outputs include a structured knowledge graph and an open-source resource repository (hosted on GitHub), filling a key gap in the literature and providing a foundational benchmark and practical guidance for researchers and practitioners.
๐ Abstract
Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly reducing manual effort. However, due to ongoing explorations in the LLM-based unit testing field, it is challenging for researchers to understand existing achievements, open challenges, and future opportunities. This paper presents the first systematic literature review on the application of LLMs in unit testing until March 2025. We analyze
umpaper{} relevant papers from the perspectives of both unit testing and LLMs. We first categorize existing unit testing tasks that benefit from LLMs, e.g., test generation and oracle generation. We then discuss several critical aspects of integrating LLMs into unit testing research, including model usage, adaptation strategies, and hybrid approaches. We further summarize key challenges that remain unresolved and outline promising directions to guide future research in this area. Overall, our paper provides a systematic overview of the research landscape to the unit testing community, helping researchers gain a comprehensive understanding of achievements and promote future research. Our artifacts are publicly available at the GitHub repository: https://github.com/iSEngLab/AwesomeLLM4UT.