Improving Random Testing via LLM-powered UI Tarpit Escaping for Mobile Apps

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of random GUI testing, which often becomes trapped in UI exploration tarpits, leading to insufficient coverage and missed defects. To overcome this, the authors introduce large language models (LLMs) into random testing for the first time, proposing two hybrid tools—HybridMonkey and HybridDroidbot—that monitor UI similarity to detect tarpits and leverage LLMs to recommend effective events that actively escape local regions, thereby enhancing exploration efficiency. Evaluated on twelve real-world applications, the approaches achieve average activity coverage improvements of 54.8% and 44.8%, respectively, uncovering 75 unique bugs—including 34 previously unknown ones—26 of which have been confirmed and fixed. Experiments on WeChat further demonstrate significant superiority over conventional random testing.
📝 Abstract
Random GUI testing is a widely-used technique for testing mobile apps. However, its effectiveness is limited by the notorious issue -- UI exploration tarpits, where the exploration is trapped in local UI regions, thus impeding test coverage and bug discovery. In this experience paper, we introduce LLM-powered random GUI Testing, a novel hybrid testing approach to mitigating UI tarpits during random testing. Our approach monitors UI similarity to identify tarpits and query LLMs to suggest promising events for escaping the encountered tarpits. We implement our approach on top of two different automated input generation (AIG) tools for mobile apps: (1) HybridMonkey upon Monkey, a state-of-the-practice tool; and (2) HybridDroidbot upon Droidbot, a state-of-the-art tool. We evaluated them on 12 popular, real-world apps. The results show that HybridMonkey and HybridDroidbot outperform all baselines, achieving average coverage improvements of 54.8% and 44.8%, respectively, and detecting the highest number of unique crashes. In total, we found 75 unique bugs, including 34 previously unknown bugs. To date, 26 bugs have been confirmed and fixed. We also applied HybridMonkey on WeChat, a popular industrial app with billions of monthly active users. HybridMonkey achieved higher activity coverage and found more bugs than random testing.
Problem

Research questions and friction points this paper is trying to address.

UI exploration tarpits
random GUI testing
test coverage
mobile app testing
bug discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered testing
UI tarpit escaping
random GUI testing
mobile app testing
test coverage improvement
🔎 Similar Papers
No similar papers found.
M
Mengqian Xu
East China Normal University
Y
Yiheng Xiong
East China Normal University
L
Le Chang
East China Normal University
Ting Su
Ting Su
East China Normal University, China
Software AnalysisTestingVerification
Chengcheng Wan
Chengcheng Wan
East China Normal University
Software engineeringsystem optimizationmachine learning
W
Weikai Miao
East China Normal University