AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of vulnerability detection data and the heavy reliance of existing AI-based methods on high-quality labeled data, this paper proposes a multi-agent collaborative vulnerability injection framework. The framework integrates a function-level code understanding agent, static analysis tools, and retrieval-augmented generation (RAG) to enable context-aware, category-specific, and realistic vulnerability injection. It further introduces low-rank adaptation (LoRA) to improve fine-tuning efficiency while ensuring safety and controllability during injection. Experiments across three benchmarks comprising 116 C/C++ functions demonstrate a function-level vulnerability injection success rate of 89%–95%, substantially outperforming baseline approaches. The generated dataset exhibits high fidelity and comprehensive coverage across vulnerability categories, establishing a high-quality, scalable data foundation for training AI-driven vulnerability detection models.

Technology Category

Application Category

📝 Abstract
The increasing complexity of software systems and the sophistication of cyber-attacks have underscored the critical need for effective automated vulnerability detection and repair systems. Traditional methods, such as static program analysis, face significant challenges related to scalability, adaptability, and high false-positive and false-negative rates. AI-driven approaches, particularly those using machine learning and deep learning models, show promise but are heavily reliant on the quality and quantity of training data. This paper introduces a novel framework designed to automatically introduce realistic, category-specific vulnerabilities into secure C/C++ codebases to generate datasets. The proposed approach coordinates multiple AI agents that simulate expert reasoning, along with function agents and traditional code analysis tools. It leverages Retrieval-Augmented Generation for contextual grounding and employs Low-Rank approximation of weights for efficient model fine-tuning. Our experimental study on 116 code samples from three different benchmarks suggests that our approach outperforms other techniques with regard to dataset accuracy, achieving between 89% and 95% success rates in injecting vulnerabilities at function level.
Problem

Research questions and friction points this paper is trying to address.

Automated vulnerability detection faces scalability and accuracy challenges
AI methods require high-quality training data for effectiveness
Generating realistic vulnerability datasets for C/C++ codebases
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents simulate expert reasoning for vulnerabilities
Retrieval-Augmented Generation provides contextual grounding
Low-Rank approximation enables efficient model fine-tuning
🔎 Similar Papers
No similar papers found.
A
Amine Lbath
Software and Systems Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
Massih-Reza Amini
Massih-Reza Amini
Professor, University Grenoble Alpes
Artificial IntelligenceMachine LearningLearning TheoryInformation Retrieval
A
Aurelien Delaitre
Software and Systems Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
V
Vadim Okun
Software and Systems Division, National Institute of Standards and Technology, Gaithersburg, MD, USA