🤖 AI Summary
This work addresses the challenge of validating system resilience in mobile applications, where traditional chaos engineering struggles due to the combinatorial explosion of user journeys, geographic variability, and backend failure modes. We present the first AI-driven, large-scale mobile chaos testing framework, which integrates DragonCrawl—an LLM-powered automated traversal tool—with uHavoc, a service-level fault injection system. This approach enables adaptive exploration and automated testing of critical user flows under realistic backend degradation scenarios, eliminating the need for manually authored test cases. It effectively uncovers mobile-specific crashes and dependency violations that surface only on-device. Deployed across Uber’s three core applications since Q1 2024, the framework has executed over 180,000 tests covering 47 key user journeys, identifying 23 resilience risks—including 12 critical functional blockers—with a root-cause localization precision of 88% (Precision@5) and 99% test reliability.
📝 Abstract
Mobile applications in large-scale distributed systems are susceptible to backend service failures, yet traditional chaos engineering approaches cannot scale mobile testing due to the combinatorial explosion of flows, locations, and failure scenarios that need validation. We present an automated mobile chaos testing system that integrates DragonCrawl, an LLM-based mobile testing platform, with uHavoc, a service-level fault injection system. The key insight is that adaptive AI-driven test execution can navigate mobile applications under degraded backend conditions, eliminating the need to manually write test cases for each combination of user flow, city, and failure type. Since Q1 2024, our system has executed over 180,000 automated chaos tests across 47 critical flows in Uber's Rider, Driver, and Eats applications, representing approximately 39,000 hours of manual testing effort that would be impractical at this scale. We identified 23 resilience risks, with 70% being architectural dependency violations where non-critical service failures degraded core user flows. Twelve issues were severe enough to prevent trip requests or food orders. Two caused application crashes detectable only through mobile chaos testing, not backend testing alone. Automated root cause analysis reduced debugging time from hours to minutes, achieving 88% precision@5 in attributing mobile failures to specific backend services. This paper presents the system design, evaluates its performance under fault injection (maintaining 99% test reliability), and reports operational experience demonstrating that continuous mobile resilience validation is achievable at production scale.