Automatic High-Level Test Case Generation using Large Language Models

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

In software testing, misalignment between business requirements and test cases impedes effective quality shift-left. To address this core challenge, we propose a high-level test case generation method prioritizing *requirement alignment*: (1) We construct the first industrial-scale dataset—BAlign—comprising requirement-aligned, executable test cases grounded in real-world business semantics; (2) We fine-tune open-source LLMs (LLaMA 3.1-8B, Mistral-7B) via supervised fine-tuning to automatically generate human-readable, executable test cases that comprehensively cover functional points and expected outcomes. Experimental results demonstrate that our fine-tuned models significantly outperform proprietary large language models (e.g., GPT-4o, Gemini) in both automated metrics and functional correctness. Human evaluation further confirms that the generated test cases exhibit high business interpretability and engineering practicality, effectively bridging the semantic gap between requirements and testing artifacts.

Technology Category

Application Category

📝 Abstract

We explored the challenges practitioners face in software testing and proposed automated solutions to address these obstacles. We began with a survey of local software companies and 26 practitioners, revealing that the primary challenge is not writing test scripts but aligning testing efforts with business requirements. Based on these insights, we constructed a use-case $ ightarrow$ (high-level) test-cases dataset to train/fine-tune models for generating high-level test cases. High-level test cases specify what aspects of the software's functionality need to be tested, along with the expected outcomes. We evaluated large language models, such as GPT-4o, Gemini, LLaMA 3.1 8B, and Mistral 7B, where fine-tuning (the latter two) yields improved performance. A final (human evaluation) survey confirmed the effectiveness of these generated test cases. Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation to streamline development.

Problem

Research questions and friction points this paper is trying to address.

Aligning testing efforts with business requirements in software development

Automating generation of high-level test cases using large language models

Improving requirement-testing alignment and early test case generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated high-level test case generation using LLMs

Fine-tuned LLaMA and Mistral for improved performance

Dataset-driven alignment of testing with business requirements

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation