Multi-Agent Penetration Testing AI for the Web

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

The widespread adoption of AI-generated code has precipitated a scalability crisis in security auditing: approximately 40% of such code contains vulnerabilities, while manual auditing lags significantly behind development velocity. To address this, we propose a multi-agent penetration testing system tailored for web applications, integrating collaborative large language model (LLM) reasoning with tool-augmented execution to establish a closed-loop workflow spanning vulnerability discovery and exploit validation. We introduce a cost-sensitive decision mechanism enabling dynamic resource allocation and early termination. Evaluated on the XBOW benchmark, our system achieves an overall success rate of 76.9%, with perfect detection (100%) for SSRF and misconfiguration vulnerabilities. It further uncovers critical flaws—including remote code execution (RCE) and command injection—in multiple high-star GitHub repositories. Crucially, the average cost per assessment is only $3.67, demonstrating strong practical viability and cost efficiency.

Technology Category

Application Category

📝 Abstract

AI-powered development platforms are making software creation accessible to a broader audience, but this democratization has triggered a scalability crisis in security auditing. With studies showing that up to 40% of AI-generated code contains vulnerabilities, the pace of development now vastly outstrips the capacity for thorough security assessment. We present MAPTA, a multi-agent system for autonomous web application security assessment that combines large language model orchestration with tool-grounded execution and end-to-end exploit validation. On the 104-challenge XBOW benchmark, MAPTA achieves 76.9% overall success with perfect performance on SSRF and misconfiguration vulnerabilities, 83% success on broken authorization, and strong results on injection attacks including server-side template injection (85%) and SQL injection (83%). Cross-site scripting (57%) and blind SQL injection (0%) remain challenging. Our comprehensive cost analysis across all challenges totals $21.38 with a median cost of $0.073 for successful attempts versus $0.357 for failures. Success correlates strongly with resource efficiency, enabling practical early-stopping thresholds at approximately 40 tool calls or $0.30 per challenge. MAPTA's real-world findings are impactful given both the popularity of the respective scanned GitHub repositories (8K-70K stars) and MAPTA's low average operating cost of $3.67 per open-source assessment: MAPTA discovered critical vulnerabilities including RCEs, command injections, secret exposure, and arbitrary file write vulnerabilities. Findings are responsibly disclosed, 10 findings are under CVE review.

Problem

Research questions and friction points this paper is trying to address.

Addressing scalability crisis in security auditing of AI-generated code

Autonomous web application security assessment with multi-agent AI

Validating exploits and identifying vulnerabilities in real-world systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with LLM orchestration

Tool-grounded execution for vulnerability detection

End-to-end exploit validation methodology

🔎 Similar Papers

No similar papers found.

Authors to Follow