LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing automated vulnerability repair methods suffer from low efficiency, high computational cost, and a propensity to introduce new errors. To address these limitations, this paper proposes an iterative, end-to-end LLM-based repair pipeline. Our approach introduces a novel multi-round, feedback-driven LLM repair paradigm that tightly integrates vulnerability context awareness, static analysis–informed prompt engineering, iterative code rewriting, and semantic similarity assessment, augmented by a human-in-the-loop validation stage to ensure semantic fidelity. Evaluated on real-world CVE-vulnerable functions, our implementation using Llama 3 (70B) achieves an average human-assessed patch quality score of 8.51/10 and improves patch semantic similarity to ground-truth fixes by 20%. To foster reproducibility and community advancement, we fully open-source all benchmarking frameworks, fine-tuned model weights, and experimental datasets.

Technology Category

Application Category

📝 Abstract

Software vulnerabilities continue to be ubiquitous, even in the era of AI-powered code assistants, advanced static analysis tools, and the adoption of extensive testing frameworks. It has become apparent that we must not simply prevent these bugs, but also eliminate them in a quick, efficient manner. Yet, human code intervention is slow, costly, and can often lead to further security vulnerabilities, especially in legacy codebases. The advent of highly advanced Large Language Models (LLM) has opened up the possibility for many software defects to be patched automatically. We propose LLM4CVE an LLM-based iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy. We examine our pipeline with State-of-the-Art LLMs, such as GPT-3.5, GPT-4o, Llama 38B, and Llama 3 70B. We achieve a human-verified quality score of 8.51/10 and an increase in groundtruth code similarity of 20% with Llama 3 70B. To promote further research in the area of LLM-based vulnerability repair, we publish our testing apparatus, fine-tuned weights, and experimental data on our website

Problem

Research questions and friction points this paper is trying to address.

Super Language Models

Automated Vulnerability Repair

Software Security

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM4CVE Process

Large Language Models

CVE Identification and Repair

🔎 Similar Papers

APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching