🤖 AI Summary
This work addresses the inefficiency in existing vulnerability reproduction methods, which often conflate attack strategies with code implementation. To resolve this, the authors propose Cve2PoC, a novel framework featuring a dual-loop architecture: a strategic planner generates structured attack plans from CVE semantics, while a tactical executor produces and validates executable proof-of-concept (PoC) code. An adaptive optimizer routes failures to the appropriate loop for targeted refinement based on failure type. Integrating LLM agents within a plan–execute–evaluate paradigm, the approach achieves reproduction success rates of 82.9% on SecBench.js and 54.3% on PatchEval—substantially outperforming the best baseline by 11.3% and 20.4%, respectively. Moreover, the generated PoCs exhibit readability and reusability comparable to human-written counterparts.
📝 Abstract
Automated vulnerability reproduction from CVE descriptions requires generating executable Proof-of-Concept (PoC) exploits and validating them in target environments. This process is critical in software security research and practice, yet remains time-consuming and demands specialized expertise when performed manually. While LLM agents show promise for automating this task, existing approaches often conflate exploring attack directions with fixing implementation details, which leads to unproductive debugging loops when reproduction fails. To address this, we propose CVE2PoC, an LLM-based dual-loop agent framework following a plan-execute-evaluate paradigm. The Strategic Planner analyzes vulnerability semantics and target code to produce structured attack plans. The Tactical Executor generates PoC code and validates it through progressive verification. The Adaptive Refiner evaluates execution results and routes failures to different loops: the Tactical Loop for code-level refinement, while the Strategic Loop for attack strategy replanning. This dual-loop design enables the framework to escape ineffective debugging by matching remediation to failure type. Evaluation on two benchmarks covering 617 real-world vulnerabilities demonstrates that CVE2PoC achieves 82.9% and 54.3% reproduction success rates on SecBench.js and PatchEval, respectively, outperforming the best baseline by 11.3% and 20.4%. Human evaluation confirms that generated PoCs achieve comparable code quality to human-written exploits in readability and reusability.