Agentic Discovery and Validation of Android App Vulnerabilities

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Android vulnerability detection tools suffer from high false-positive rates and low true-positive rates, leading to time-consuming manual validation and frequent omission of critical vulnerabilities. To address this, we propose A2, the first system to integrate agent-based reasoning into Android security analysis. A2 establishes a two-phase closed-loop framework—“vulnerability discovery” followed by “automated verification”—by unifying semantic understanding, static and dynamic program analysis, and multimodal attack-surface modeling (encompassing UI interactions, inter-component communication, etc.), while synergistically combining traditional scanners with agent-driven decision-making. It further supports speculative vulnerability identification and end-to-end proof-of-concept (PoC) generation. Evaluated on the Ghera benchmark, A2 achieves 78.3% vulnerability coverage, identifies 82 speculative vulnerabilities (including 28 newly confirmed true positives), and generates 51 executable PoCs. On 169 real-world APKs, it detects 104 zero-day vulnerabilities, with 57 fully self-verified.

Technology Category

Application Category

📝 Abstract

Existing Android vulnerability detection tools overwhelm teams with thousands of low-signal warnings yet uncover few true positives. Analysts spend days triaging these results, creating a bottleneck in the security pipeline. Meanwhile, genuinely exploitable vulnerabilities often slip through, leaving opportunities open to malicious counterparts. We introduce A2, a system that mirrors how security experts analyze and validate Android vulnerabilities through two complementary phases: (i) Agentic Vulnerability Discovery, which reasons about application security by combining semantic understanding with traditional security tools; and (ii) Agentic Vulnerability Validation, which systematically validates vulnerabilities across Android's multi-modal attack surface-UI interactions, inter-component communication, file system operations, and cryptographic computations. On the Ghera benchmark (n=60), A2 achieves 78.3% coverage, surpassing state-of-the-art analyzers (e.g., APKHunt 30.0%). Rather than overwhelming analysts with thousands of warnings, A2 distills results into 82 speculative vulnerability findings, including 47 Ghera cases and 28 additional true positives. Crucially, A2 then generates working Proof-of-Concepts (PoCs) for 51 of these speculative findings, transforming them into validated vulnerability findings that provide direct, self-confirming evidence of exploitability. In real-world evaluation on 169 production APKs, A2 uncovers 104 true-positive zero-day vulnerabilities. Among these, 57 (54.8%) are self-validated with automatically generated PoCs, including a medium-severity vulnerability in a widely used application with over 10 million installs.

Problem

Research questions and friction points this paper is trying to address.

Reduces overwhelming low-signal vulnerability warnings in Android apps

Addresses bottleneck in security analysis through automated triaging

Prevents exploitable vulnerabilities from being missed by current tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Vulnerability Discovery combining semantic understanding with tools

Agentic Vulnerability Validation across multi-modal attack surfaces

Automatically generating working Proof-of-Concepts for self-validation

🔎 Similar Papers

Revisiting Static Feature-Based Android Malware Detection