🤖 AI Summary
To address the scarcity of vulnerability detection data and the heavy reliance of existing AI-based methods on high-quality labeled data, this paper proposes a multi-agent collaborative vulnerability injection framework. The framework integrates a function-level code understanding agent, static analysis tools, and retrieval-augmented generation (RAG) to enable context-aware, category-specific, and realistic vulnerability injection. It further introduces low-rank adaptation (LoRA) to improve fine-tuning efficiency while ensuring safety and controllability during injection. Experiments across three benchmarks comprising 116 C/C++ functions demonstrate a function-level vulnerability injection success rate of 89%–95%, substantially outperforming baseline approaches. The generated dataset exhibits high fidelity and comprehensive coverage across vulnerability categories, establishing a high-quality, scalable data foundation for training AI-driven vulnerability detection models.
📝 Abstract
The increasing complexity of software systems and the sophistication of cyber-attacks have underscored the critical need for effective automated vulnerability detection and repair systems. Traditional methods, such as static program analysis, face significant challenges related to scalability, adaptability, and high false-positive and false-negative rates. AI-driven approaches, particularly those using machine learning and deep learning models, show promise but are heavily reliant on the quality and quantity of training data. This paper introduces a novel framework designed to automatically introduce realistic, category-specific vulnerabilities into secure C/C++ codebases to generate datasets. The proposed approach coordinates multiple AI agents that simulate expert reasoning, along with function agents and traditional code analysis tools. It leverages Retrieval-Augmented Generation for contextual grounding and employs Low-Rank approximation of weights for efficient model fine-tuning. Our experimental study on 116 code samples from three different benchmarks suggests that our approach outperforms other techniques with regard to dataset accuracy, achieving between 89% and 95% success rates in injecting vulnerabilities at function level.