🤖 AI Summary
Existing approaches struggle to automatically and reliably verify the practical exploitability of third-party component vulnerabilities in black-box web applications. This work proposes AutoEG, the first fully automated multi-agent framework that leverages natural language processing to precisely extract and encapsulate triggering logic from unstructured vulnerability descriptions. By iteratively refining attack payloads through target feedback, AutoEG adaptively exploits vulnerabilities across diverse real-world deployment environments. Evaluated on 104 real vulnerabilities across 29 targets, AutoEG completed 660 exploitation tasks (55,440 attempts) with an average success rate of 82.41%, substantially outperforming the current state-of-the-art baseline (32.88%) and significantly advancing the automation and effectiveness of exploitability verification in black-box scenarios.
📝 Abstract
Large-scale web applications are widely deployed with complex third-party components, inheriting security risks arising from component vulnerabilities. Security assessment is therefore required to determine whether such known vulnerabilities remain practically exploitable in real applications. Penetration testing is a widely adopted approach that validates exploitability by launching concrete attacks against known vulnerabilities in real-world black-box systems. However, existing approaches often fail to automatically generate reliable exploits, limiting their effectiveness in practical security assessment. This limitation mainly stems from two issues: (1) precisely triggering vulnerabilities with correct technical details, and (2) adapting exploits to diverse real-world deployment settings.
In this paper, we propose AutoEG, a fully automated multi-agent framework for exploit generation targeting black-box web applications. AutoEG has two phases: First, AutoEG extracts precise vulnerability trigger logic from unstructured vulnerability information and encapsulates it into reusable trigger functions. Second, AutoEG uses trigger functions for concrete attack objectives and iteratively refines exploits through feedback-driven interaction with the target application. We evaluate AutoEG on 104 real-world vulnerabilities with 29 attack objectives, resulting in 660 exploitation tasks and 55,440 exploit attempts. AutoEG achieves an average success rate of 82.41%, substantially outperforming state-of-the-art baselines, whose best performance reaches only 32.88%.