Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the challenge of unreliable outputs from large language models in software test case generation and code review due to hallucination. It presents the first systematic integration of Retrieval-Augmented Generation (RAG) into software verification and validation activities. By constructing a RAG pipeline that dynamically incorporates contextual information from external knowledge bases, the approach effectively mitigates model hallucinations and enhances the accuracy and reliability of generated content. Experimental results demonstrate that this method significantly improves the quality of both test case generation and code review, reducing manual effort while strengthening the overall effectiveness of software verification processes.

Technology Category

Application Category

📝 Abstract

In this paper, we focus on automating two of the widely used Verification and Validation (V&V) activities in the Software Development Lifecycle (SDLC): Software testing and software inspection (also known as review). Concerning the former, we concentrate on automated test case generation using Large Language Models (LLMs). For the latter, we enable inspection of the source code by LLMs. To address the known LLM hallucination problem, in which LLMs confidently produce incorrect outputs, we implement a Retrieval Augmented Generation (RAG) pipeline to integrate supplementary knowledge sources and provide additional context to the LLM. Our experimental results indicate that incorporating external context via the RAG pipeline has a generally positive impact on both test case generation and code inspection. This novel approach reduces the total project cost by saving human testers'/inspectors' time. It also improves the effectiveness and efficiency of these V&V activities, as evidenced by our experimental study.

Problem

Research questions and friction points this paper is trying to address.

Software Testing

Code Inspection

Large Language Models

Hallucination

Verification and Validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval Augmented Generation

Large Language Models

Software Testing