XTrace: A Non-Invasive Dynamic Tracing Framework for Android Applications in Production

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
“Ghost bugs”—intermittent, hard-to-reproduce crashes in production Android apps—elude traditional post-crash analysis due to missing real-time execution context. Method: We propose a non-intrusive, version-free dynamic tracing framework leveraging a deeply optimized ART-built instrumentation mechanism, integrating bytecode-level non-invasive instrumentation, lightweight hook scheduling, and context snapshot compression. Contribution/Results: Our approach enables full-method-call observability in production with ultra-low overhead and high stability: <7 ms app startup latency, <0.01 ms per-call overhead, and no statistically significant impact on crash or ANR rates (p > 0.05). Deployed in an Android app with hundreds of millions of daily active users, it successfully identified over 11 critical production crashes and improved root-cause analysis efficiency by >90%.

Technology Category

Application Category

📝 Abstract
As the complexity of mobile applications grows exponentially and the fragmentation of user device environments intensifies, ensuring online application stability faces unprecedented challenges. Traditional methods, such as static logging and post-crash analysis, lack real-time contextual information, rendering them ineffective against "ghost bugs" that only manifest in specific scenarios. This highlights an urgent need for dynamic runtime observability: intercepting and tracing arbitrary methods in production without requiring an app release. We propose XTrace, a novel dynamic tracing framework. XTrace introduces a new paradigm of non-invasive proxying, which avoids direct modification of the virtual machine's underlying data structures. It achieves high-performance method interception by leveraging and optimizing the highly stable, built-in instrumentation mechanism of the Android ART virtual machine. Evaluated in a ByteDance application with hundreds of millions of daily active users, XTrace demonstrated production-grade stability and performance. Large-scale online A/B experiments confirmed its stability, showing no statistically significant impact (p > 0.05) on Crash User Rate or ANR rate, while maintaining minimal overhead (<7 ms startup latency, <0.01 ms per-method call) and broad compatibility (Android 5.0-15+). Critically, XTrace diagnosed over 11 severe online crashes and multiple performance bottlenecks, improving root-cause localization efficiency by over 90%. This confirms XTrace provides a production-grade solution that reconciles the long-standing conflict between stability and comprehensive coverage in Android dynamic tracing.
Problem

Research questions and friction points this paper is trying to address.

Enables dynamic tracing of Android apps without app release
Diagnoses ghost bugs and performance bottlenecks in production
Reconciles stability and coverage in Android dynamic tracing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-invasive proxying avoids VM modification
Optimizes ART instrumentation for high-performance interception
Enables production tracing without app release
🔎 Similar Papers
No similar papers found.