Acceptance Test Generation with Large Language Models: An Industrial Case Study

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the limited application of large language models (LLMs) in acceptance testing. We propose a two-stage, executable acceptance test generation method tailored for industrial-grade Web applications: first, generating Gherkin-formatted test scenarios from user stories; second, automatically compiling them into executable Cypress scripts by leveraging page HTML structure. Our approach introduces two complementary toolchains—AutoUAT (for semantic modeling) and Test Flow (for code generation)—integrating HTML-structure-aware prompt engineering with Gherkin grammar parsing to balance natural-language fidelity and engineering executability. Evaluation results show that 95% of AutoUAT-generated test scenarios are deemed useful by domain experts, while 92% of Test Flow–generated test cases are valid, with 60% requiring no modification for immediate execution. This significantly improves acceptance test coverage and defect detection rates in real-world Web development workflows.

Technology Category

Application Category

📝 Abstract
Large language model (LLM)-powered assistants are increasingly used for generating program code and unit tests, but their application in acceptance testing remains underexplored. To help address this gap, this paper explores the use of LLMs for generating executable acceptance tests for web applications through a two-step process: (i) generating acceptance test scenarios in natural language (in Gherkin) from user stories, and (ii) converting these scenarios into executable test scripts (in Cypress), knowing the HTML code of the pages under test. This two-step approach supports acceptance test-driven development, enhances tester control, and improves test quality. The two steps were implemented in the AutoUAT and Test Flow tools, respectively, powered by GPT-4 Turbo, and integrated into a partner company's workflow and evaluated on real-world projects. The users found the acceptance test scenarios generated by AutoUAT helpful 95% of the time, even revealing previously overlooked cases. Regarding Test Flow, 92% of the acceptance test cases generated by Test Flow were considered helpful: 60% were usable as generated, 8% required minor fixes, and 24% needed to be regenerated with additional inputs; the remaining 8% were discarded due to major issues. These results suggest that LLMs can,in fact, help improve the acceptance test process with appropriate tooling and supervision.
Problem

Research questions and friction points this paper is trying to address.

Exploring LLMs for generating executable acceptance tests in web applications
Converting natural language test scenarios into executable scripts with HTML context
Evaluating LLM-powered tools for acceptance test-driven development and quality improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates Gherkin scenarios from user stories
Converts Gherkin to Cypress scripts using HTML
Integrates GPT-4 Turbo in AutoUAT and Test Flow
Margarida Ferreira
Margarida Ferreira
Instituto Superior Técnico
L
Luis Viegas
Critical TechWorks and Faculty of Engineering, University of Porto, Porto, Portugal
J
Jo˜ao Pascoal Faria
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
B
B. Lima
LIACC, Faculty of Engineering, University of Porto, Porto, Portugal