From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality exploit datasets, the high cost of manual construction, and its reliance on expert knowledge, this paper introduces CVE-GENIE—the first large language model–based multi-agent framework for end-to-end generation of verifiable vulnerability exploits from CVE entries. CVE-GENIE integrates automated environment reconstruction, cross-source resource retrieval, and exploit synthesis, enabling collaborative execution of complex security tasks by specialized agents. Evaluated on 841 CVEs disclosed between 2024 and 2025, it successfully reproduced 428 vulnerabilities (51% success rate), with an average cost of $2.77 per CVE. All generated exploits are executable and empirically validated. This work significantly lowers the barrier to constructing high-fidelity exploit datasets, thereby supporting fuzzing-based vulnerability assessment, patch validation, and benchmarking of AI-driven security capabilities.

Technology Category

Application Category

📝 Abstract
High-quality datasets of real-world vulnerabilities and their corresponding verifiable exploits are crucial resources in software security research. Yet such resources remain scarce, as their creation demands intensive manual effort and deep security expertise. In this paper, we present CVE-GENIE, an automated, large language model (LLM)-based multi-agent framework designed to reproduce real-world vulnerabilities, provided in Common Vulnerabilities and Exposures (CVE) format, to enable creation of high-quality vulnerability datasets. Given a CVE entry as input, CVE-GENIE gathers the relevant resources of the CVE, automatically reconstructs the vulnerable environment, and (re)produces a verifiable exploit. Our systematic evaluation highlights the efficiency and robustness of CVE-GENIE's design and successfully reproduces approximately 51% (428 of 841) CVEs published in 2024-2025, complete with their verifiable exploits, at an average cost of $2.77 per CVE. Our pipeline offers a robust method to generate reproducible CVE benchmarks, valuable for diverse applications such as fuzzer evaluation, vulnerability patching, and assessing AI's security capabilities.
Problem

Research questions and friction points this paper is trying to address.

Automating the reproduction of real-world vulnerabilities from CVE entries
Generating verifiable exploits to create high-quality vulnerability datasets
Reducing manual effort and expertise needed for exploit verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based multi-agent framework for vulnerability reproduction
Automated reconstruction of vulnerable environments from CVEs
Generates verifiable exploits with cost-effective scalability
🔎 Similar Papers