VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

πŸ“… 2026-05-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

180K/year
πŸ€– AI Summary
Existing learning-based vulnerability detection methods struggle to model the structural dependencies, domain knowledge, and complex semantics of code, leading large language models (LLMs) to suffer from high false-negative and false-positive rates in vulnerability identification. To address this, this work proposes a tri-path context augmentation framework that, for the first time, integrates structural (AST/CFG/DFG), knowledge-based (CWE pattern retrieval), and semantic (functional summary) contexts into a unified instruction to jointly guide LLMs in vulnerability reasoning. By combining program analysis, hybrid dense–sparse retrieval, and prompt engineering, the method achieves state-of-the-art performance on the PrimeVul benchmark, significantly outperforming existing deep learning and LLM baselines, and demonstrates strong generalization capability on low-resource, imbalanced Kotlin datasets.
πŸ“ Abstract
Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the structural dependencies, domain-specific vulnerability knowledge, and complex program semantics required for accurate detection. Recent Large Language Models (LLMs) have shown strong code understanding ability, but directly prompting them with raw source code often leads to missed vulnerabilities or false alarms, especially when vulnerable and benign functions differ only in subtle semantic details. To address this, we propose VulTriage, a triple-path context augmentation framework for LLM-based vulnerability detection. VulTriage enhances the LLM input through three complementary paths: a Control Path that extracts and verbalizes AST, CFG, and DFG information to expose control and data dependencies; a Knowledge Path that retrieves relevant CWE-derived vulnerability patterns and examples through hybrid dense--sparse retrieval; and a Semantic Path that summarizes the functional behavior of the code before the final judgment. These contexts are integrated into a unified instruction to guide the LLM toward more reliable vulnerability reasoning. Experiments on the PrimeVul pair test set show that VulTriage achieves state-of-the-art performance, outperforming existing deep learning and LLM-based baselines on key pair-wise and classification metrics. Further ablation studies verify the effectiveness of each path, and additional experiments on the Kotlin dataset demonstrate the generalization ability of VulTriage under low-resource and class-imbalanced settings. Our code is available at https://github.com/vinsontang1/VulTriage
Problem

Research questions and friction points this paper is trying to address.

vulnerability detection
Large Language Models
program semantics
structural dependencies
false alarms
Innovation

Methods, ideas, or system contributions that make the work stand out.

triple-path context augmentation
LLM-based vulnerability detection
program semantics
hybrid retrieval
control/data flow verbalization
πŸ”Ž Similar Papers