🤖 AI Summary
Current research on large language models (LLMs) in software testing lacks systematic organization and a structured conceptual framework. To address this gap, we conduct a semi-systematic literature review and—first in the field—propose a comprehensive taxonomy and development roadmap for LLM-driven testing, covering core applications such as test code generation and documentation summarization. We introduce the first LLM-centric research framework for software testing, explicitly partitioned into three pillars: model capability evaluation, test-task adaptation, and trustworthiness assurance. Our analysis identifies critical challenges, including data bias, limited interpretability, and the absence of standardized evaluation metrics, and proposes empirically verifiable directions for future work. This study provides both theoretical grounding and practical guidance, significantly enhancing conceptual clarity and enabling more systematic, roadmap-driven technological advancement in LLM-based software testing.
📝 Abstract
Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field.
Specifically, they have been successfully applied in software testing tasks such as generating test code, or summarizing documentation.
This potential has attracted hundreds of researchers, resulting in dozens of new contributions every month, hardening researchers to
stay at the forefront of the wave. Still, to the best of our knowledge, no prior work has provided a structured vision of the progress
and most relevant research trends in LLM-based testing. In this article, we aim to provide a roadmap that illustrates its current state,
grouping the contributions into different categories, and also sketching the most promising and active research directions for the field.
To achieve this objective, we have conducted a semi-systematic literature review, collecting articles and mapping them into the most
prominent categories, reviewing the current and ongoing status, and analyzing the open challenges of LLM-based software testing.
Lastly, we have outlined several expected long-term impacts of LLMs over the whole software testing field.