Can You Mimic Me? Exploring the Use of Android Record&Replay Tools in Debugging

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited effectiveness of Android record-and-replay (R&R) tools in realistic debugging scenarios—including non-crashing functional defects, crashing vulnerabilities, and functional-level user workflows. We conduct the first cross-industry and cross-academic empirical study, systematically evaluating the robustness of mainstream R&R tools using a benchmark comprising 34 representative scenarios, 90 non-crashing defects, and 31 crashing defects, augmented with automated input generation (AIG). Our analysis identifies three fundamental bottlenecks: inaccurate timing precision between UI actions, restricted API compatibility, and constraints imposed by Android’s underlying system architecture. Results show that, on average, 44% of crashing defects, 38% of non-crashing defects, and 17% of user scenarios fail stable replay. These findings provide critical failure root-cause insights and empirically grounded design guidelines for next-generation high-reliability UI testing tools.

Technology Category

Application Category

📝 Abstract
Android User Interface (UI) testing is a critical research area due to the ubiquity of apps and the challenges faced by developers. Record and replay (R&R) tools facilitate manual and automated UI testing by recording UI actions to execute test scenarios and replay bugs. These tools typically support (i) regression testing, (ii) non-crashing functional bug reproduction, and (iii) crashing bug reproduction. However, prior work only examines these tools in fragmented settings, lacking a comprehensive evaluation across common use cases. We address this gap by conducting an empirical study on using R&R tools to record and replay non-crashing failures, crashing bugs, and feature-based user scenarios, and explore combining R&R with automated input generation (AIG) tools to replay crashing bugs. Our study involves one industrial and three academic R&R tools, 34 scenarios from 17 apps, 90 non-crashing failures from 42 apps, and 31 crashing bugs from 17 apps. Results show that 17% of scenarios, 38% of non-crashing bugs, and 44% of crashing bugs cannot be reliably recorded and replayed, mainly due to action interval resolution, API incompatibility, and Android tooling limitations. Our findings highlight key future research directions to enhance the practical application of R&R tools.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Android R&R tools' effectiveness in UI testing
Assessing R&R tools for non-crashing and crashing bug reproduction
Exploring R&R and AIG integration for bug replay reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical study on Android R&R tools
Combining R&R with automated input generation
Evaluating R&R across diverse bug scenarios
🔎 Similar Papers
Zihe Song
Zihe Song
University of Texas at Dallas
Computer ScienceSoftware TestingImitation Learning
S
S. M. H. Mansur
George Mason University, Fairfax, VA, USA
Ravishka Rathnasuriya
Ravishka Rathnasuriya
The University of Texas at Dallas
Software EngineeringAI4SESE4AIProgram AnalysisAdversarial Machine Learning
Y
Yumna Fatima
George Mason University, Fairfax, VA, USA
W
Wei Yang
University of Texas at Dallas, Richardson, TX, USA
K
Kevin Moran
University of Central Florida, Orlando, FL, USA
Wing Lam
Wing Lam
George Mason University, Fairfax, VA, USA