🤖 AI Summary
Android vulnerability detection tools suffer from high false-positive rates and low true-positive rates, leading to time-consuming manual validation and frequent omission of critical vulnerabilities. To address this, we propose A2, the first system to integrate agent-based reasoning into Android security analysis. A2 establishes a two-phase closed-loop framework—“vulnerability discovery” followed by “automated verification”—by unifying semantic understanding, static and dynamic program analysis, and multimodal attack-surface modeling (encompassing UI interactions, inter-component communication, etc.), while synergistically combining traditional scanners with agent-driven decision-making. It further supports speculative vulnerability identification and end-to-end proof-of-concept (PoC) generation. Evaluated on the Ghera benchmark, A2 achieves 78.3% vulnerability coverage, identifies 82 speculative vulnerabilities (including 28 newly confirmed true positives), and generates 51 executable PoCs. On 169 real-world APKs, it detects 104 zero-day vulnerabilities, with 57 fully self-verified.
📝 Abstract
Existing Android vulnerability detection tools overwhelm teams with thousands of low-signal warnings yet uncover few true positives. Analysts spend days triaging these results, creating a bottleneck in the security pipeline. Meanwhile, genuinely exploitable vulnerabilities often slip through, leaving opportunities open to malicious counterparts.
We introduce A2, a system that mirrors how security experts analyze and validate Android vulnerabilities through two complementary phases: (i) Agentic Vulnerability Discovery, which reasons about application security by combining semantic understanding with traditional security tools; and (ii) Agentic Vulnerability Validation, which systematically validates vulnerabilities across Android's multi-modal attack surface-UI interactions, inter-component communication, file system operations, and cryptographic computations.
On the Ghera benchmark (n=60), A2 achieves 78.3% coverage, surpassing state-of-the-art analyzers (e.g., APKHunt 30.0%). Rather than overwhelming analysts with thousands of warnings, A2 distills results into 82 speculative vulnerability findings, including 47 Ghera cases and 28 additional true positives. Crucially, A2 then generates working Proof-of-Concepts (PoCs) for 51 of these speculative findings, transforming them into validated vulnerability findings that provide direct, self-confirming evidence of exploitability.
In real-world evaluation on 169 production APKs, A2 uncovers 104 true-positive zero-day vulnerabilities. Among these, 57 (54.8%) are self-validated with automatically generated PoCs, including a medium-severity vulnerability in a widely used application with over 10 million installs.