Temac: Multi-Agent Collaboration for Automated Web GUI Testing

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Automated GUI testing (AWGT) for complex web applications suffers from inadequate functional coverage, low success rates when LLMs directly execute GUI tasks, and inefficient context modeling. Method: This paper proposes an LLM-driven multi-agent collaborative testing framework featuring a novel “exploration–summarization–targeted execution” three-phase closed loop, enabling dynamic mutual enhancement between testing processes and LLM knowledge. It introduces a specialized, role-distributed agent architecture to support domain-specific knowledge accumulation and precise inference of uncovered functionalities. Results: Evaluated on six open-source complex web applications, the framework achieves average code coverage improvements of 12.5%–60.3%. Across twenty real-world web applications, it identifies 445 unique failure cases, significantly enhancing test depth, breadth, and practical applicability.

Technology Category

Application Category

📝 Abstract

Quality assurance of web applications is critical, as web applications play an essential role in people's daily lives. To reduce labor costs, automated web GUI testing (AWGT) is widely adopted, exploring web applications via GUI actions such as clicks and text inputs. However, these approaches face limitations in generating continuous and meaningful action sequences capable of covering complex functionalities. Recent work incorporates large language models (LLMs) for GUI testing. However, these approaches face various challenges, including low efficiency of LLMs, high complexity of rich web application contexts, and a low success rate of LLMs in executing GUI tasks. To address these challenges, in this paper, we propose Temac, an approach that enhances AWGT using LLM-based multi-agent collaboration to increase code coverage. Temac is motivated by our insight that LLMs can enhance AWGT in executing complex functionalities, while the information discovered during AWGT can, in turn, be provided as the domain knowledge to improve the LLM-based task execution. Specifically, given a web application, Temac initially runs an existing approach to broadly explore application states. When the testing coverage stagnates, Temac then employs LLM-based agents to summarize the collected information to form a knowledge base and to infer not-covered functionalities. Guided by this knowledge base, Temac finally uses specialized LLM-based agents to target and execute the not-covered functionalities, reaching deeper states beyond those explored by the existing approach. Our evaluation results show that Temac exceeds state-of-the-art approaches from 12.5% to 60.3% on average code coverage on six complex open-source web applications, while revealing 445 unique failures in the top 20 real-world web applications. These results strongly demonstrate the effectiveness and the general applicability of Temac.

Problem

Research questions and friction points this paper is trying to address.

Enhances automated web GUI testing with multi-agent collaboration

Addresses low efficiency and success rate of LLMs in GUI tasks

Improves code coverage by targeting unexplored functionalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based multi-agent collaboration for testing

Knowledge base from explored application states

Targeted execution of not-covered functionalities

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation