🤖 AI Summary
In software testing, misalignment between business requirements and test cases impedes effective quality shift-left. To address this core challenge, we propose a high-level test case generation method prioritizing *requirement alignment*: (1) We construct the first industrial-scale dataset—BAlign—comprising requirement-aligned, executable test cases grounded in real-world business semantics; (2) We fine-tune open-source LLMs (LLaMA 3.1-8B, Mistral-7B) via supervised fine-tuning to automatically generate human-readable, executable test cases that comprehensively cover functional points and expected outcomes. Experimental results demonstrate that our fine-tuned models significantly outperform proprietary large language models (e.g., GPT-4o, Gemini) in both automated metrics and functional correctness. Human evaluation further confirms that the generated test cases exhibit high business interpretability and engineering practicality, effectively bridging the semantic gap between requirements and testing artifacts.
📝 Abstract
We explored the challenges practitioners face in software testing and proposed automated solutions to address these obstacles. We began with a survey of local software companies and 26 practitioners, revealing that the primary challenge is not writing test scripts but aligning testing efforts with business requirements. Based on these insights, we constructed a use-case $
ightarrow$ (high-level) test-cases dataset to train/fine-tune models for generating high-level test cases. High-level test cases specify what aspects of the software's functionality need to be tested, along with the expected outcomes. We evaluated large language models, such as GPT-4o, Gemini, LLaMA 3.1 8B, and Mistral 7B, where fine-tuning (the latter two) yields improved performance. A final (human evaluation) survey confirmed the effectiveness of these generated test cases. Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation to streamline development.