🤖 AI Summary
To address the problem that large language models (LLMs) generate vulnerable code due to knowledge cutoff—introducing novel CVEs into real-world systems—this paper proposes a multi-agent framework for automated high-risk vulnerability identification, validation, and patching in production-grade software such as the Linux kernel and Chrome. Our method integrates semantic matching with taint analysis for precise CVE matching, introduces an enhanced chain-of-thought (CoT) prompting technique enabling complex vulnerability reasoning without explicit error localization, and constructs a structured RAG knowledge base comprising 525 real-world vulnerability snippets. Experiments demonstrate a 90.4% CVE matching accuracy, 89.5% F1-score for vulnerability validation, and 95.0% patch correctness rate, while reducing computational cost by over 50× compared to full fine-tuning.
📝 Abstract
Large Language Models (LLMs) have emerged as promising tools in software development, enabling automated code generation and analysis. However, their knowledge is limited to a fixed cutoff date, making them prone to generating code vulnerable to newly disclosed CVEs. Frequent fine-tuning with new CVE sets is costly, and existing LLM-based approaches focus on oversimplified CWE examples and require providing explicit bug locations to LLMs, limiting their ability to patch complex real-world vulnerabilities. To address these limitations, we propose AutoPatch, a multi-agent framework designed to patch vulnerable LLM-generated code, particularly those introduced after the LLMs' knowledge cutoff. AutoPatch integrates Retrieval-Augmented Generation (RAG) with a structured database of recently disclosed vulnerabilities, comprising 525 code snippets derived from 75 high-severity CVEs across real-world systems such as the Linux kernel and Chrome. AutoPatch combines semantic and taint analysis to identify the most relevant CVE and leverages enhanced Chain-of-Thought (CoT) reasoning to construct enriched prompts for verification and patching. Our unified similarity model, which selects the most relevant vulnerabilities, achieves 90.4 percent accuracy in CVE matching. AutoPatch attains 89.5 percent F1-score for vulnerability verification and 95.0 percent accuracy in patching, while being over 50x more cost-efficient than traditional fine-tuning approaches.