Understanding LLM-Driven Test Oracle Generation

📅 2025-11-19

🏛️ 2025 2nd IEEE/ACM International Conference on AI-powered Software (AIware)

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This study addresses the challenge of test oracle generation in automated testing, where determining the correctness of program behavior remains difficult. It presents the first systematic evaluation of large language models (LLMs) for generating test oracles capable of exposing software defects. By integrating diverse prompt engineering strategies with contextual information, the work empirically analyzes how different prompt designs influence oracle quality. The findings reveal both the strengths and limitations of LLMs in producing accurate descriptions of expected program behavior. This research provides critical empirical evidence for LLM-driven software testing and advances the development of automated test oracle generation within the Promptware paradigm.

Technology Category

Application Category

📝 Abstract

Automated unit test generation aims to improve software quality while reducing the time and effort required for creating tests manually. However, existing techniques primarily generate regression oracles that predicate on the implemented behavior of the class under test. They do not address the oracle problem: the challenge of distinguishing correct from incorrect program behavior. With the rise of Foundation Models (FMs), particularly Large Language Models (LLMs), there is a new opportunity to generate test oracles that reflect intended behavior. This positions LLMs as enablers of Promptware, where software creation and testing are driven by natural-language prompts. This paper presents an empirical study on the effectiveness of LLMs in generating test oracles that expose software failures. We investigate how different prompting strategies and levels of contextual input impact the quality of LLM-generated oracles. Our findings offer insights into the strengths and limitations of LLM-based oracle generation in the FM era, improving our understanding of their capabilities and fostering future research in this area.

Problem

Research questions and friction points this paper is trying to address.

test oracle generation

oracle problem

Large Language Models

software testing

Foundation Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Test Oracle Generation

Promptware