Automated Vulnerability Validation and Verification: A Large Language Model Approach

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Software vulnerability assessment has long been hindered by the scarcity of high-quality, diverse exploit behavior datasets. This paper introduces the first end-to-end automated framework integrating large language models (LLMs) with retrieval-augmented generation (RAG), leveraging multi-source external knowledge—including CVE descriptions, threat advisories, and code snippets—to generate containerized experimental environments and executable exploit code. It systematically reproduces and validates multiple vulnerability classes, including buffer overflows, denial-of-service, and remote code execution. Notably, it achieves the first LLM-driven, cross-component vulnerability chain orchestration across multi-container environments and uncovers pervasive semantic inconsistencies in CVE descriptions, thereby advancing disclosure standardization. Evaluated across diverse programming languages and libraries, the method demonstrates high reproduction fidelity. All artifacts—including environments, exploits, and evaluation scripts—are open-sourced, substantially improving reproducibility and assessment efficiency in security research.

Technology Category

Application Category

📝 Abstract
Software vulnerabilities remain a critical security challenge, providing entry points for attackers into enterprise networks. Despite advances in security practices, the lack of high-quality datasets capturing diverse exploit behavior limits effective vulnerability assessment and mitigation. This paper introduces an end-to-end multi-step pipeline leveraging generative AI, specifically large language models (LLMs), to address the challenges of orchestrating and reproducing attacks to known software vulnerabilities. Our approach extracts information from CVE disclosures in the National Vulnerability Database, augments it with external public knowledge (e.g., threat advisories, code snippets) using Retrieval-Augmented Generation (RAG), and automates the creation of containerized environments and exploit code for each vulnerability. The pipeline iteratively refines generated artifacts, validates attack success with test cases, and supports complex multi-container setups. Our methodology overcomes key obstacles, including noisy and incomplete vulnerability descriptions, by integrating LLMs and RAG to fill information gaps. We demonstrate the effectiveness of our pipeline across different vulnerability types, such as memory overflows, denial of service, and remote code execution, spanning diverse programming languages, libraries and years. In doing so, we uncover significant inconsistencies in CVE descriptions, emphasizing the need for more rigorous verification in the CVE disclosure process. Our approach is model-agnostic, working across multiple LLMs, and we open-source the artifacts to enable reproducibility and accelerate security research. To the best of our knowledge, this is the first system to systematically orchestrate and exploit known vulnerabilities in containerized environments by combining general-purpose LLM reasoning with CVE data and RAG-based context enrichment.
Problem

Research questions and friction points this paper is trying to address.

Automating vulnerability validation using large language models
Addressing incomplete CVE descriptions with retrieval-augmented generation
Systematically reproducing attacks in containerized environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for automated vulnerability validation pipeline
Using RAG to augment CVE data with external knowledge
Automating containerized exploit generation for diverse vulnerabilities
🔎 Similar Papers
No similar papers found.