Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models

📅 2025-03-31
🏛️ International Conference on Software Testing, Verification and Validation Workshops
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
Manual GUI exploration for engineering software is costly, while conventional automated testing relies on predefined scripts and struggles to detect semantic-level counterintuitive behaviors. Method: This paper proposes the first scriptless, semantics-driven LLM-based exploratory testing method tailored to real-world industrial engineering software. Leveraging multi-stage prompt engineering, GUI-state awareness, and task-oriented path generation, it employs GPT-series large language models to autonomously comprehend interface semantics and generate valid test actions. Contribution/Results: Evaluated on multiple engineering applications subjected to extensive manual testing, the method successfully identified several genuine UI defects—particularly in regions overlooked by human testers—and significantly improved detection efficiency for counterintuitive interactions and interface inconsistencies. This work provides the first empirical validation of LLMs’ feasibility and practicality in scriptless GUI exploratory testing.

Technology Category

Application Category

📝 Abstract
One important step in software development is testing the finished product with actual users. These tests aim, among other goals, at determining unintuitive behavior of the software as it is presented to the end-user. Moreover, they aim to determine inconsistencies in the user-facing interface. They provide valuable feedback for the development of the software, but are time-intensive to conduct.In this work, we present GERALLT, a system that uses Large Language Models (LLMs) to perform exploratory tests of the Graphical User Interface (GUI) of a real-life engineering software. GERALLT automatically generates a list of potential unintuitive and inconsistent parts of the interface. We present the architecture of GERALLT and evaluate it on a real-world use case of the engineering software, which has been extensively tested by developers and users. Our results show that GERALLT is able to determine issues with the interface that support the software development team in future development of the software.
Problem

Research questions and friction points this paper is trying to address.

Automating GUI testing for engineering software using LLMs
Identifying unintuitive and inconsistent interface behaviors automatically
Reducing time-intensive manual testing in software development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs for GUI exploratory testing
Automatically detects unintuitive interface parts
Evaluated on real-world engineering software