LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing deep learning–based vulnerability detection methods suffer from significant accuracy degradation (up to 45% on challenging datasets) and poor robustness against minor code perturbations, primarily due to insufficient contextual modeling and weak capability in capturing cross-function dependencies. Method: We propose CPG-LLM, the first framework leveraging Code Property Graphs (CPGs) to guide semantic slicing and LLM input construction; graph-structured constraints preserve cross-function vulnerability context while drastically reducing redundancy (slice size reduced by 67.84%–90.93%). Additionally, we design a graph-guided attention mechanism to enhance the LLM’s contextual awareness of vulnerability patterns. Contribution/Results: On multiple benchmark datasets, CPG-LLM achieves 15%–40% improvement in F1-score, significantly advancing whole-project-level analysis capability and robustness against code transformations.

Technology Category

Application Category

📝 Abstract

Software vulnerabilities present a persistent security challenge, with over 25,000 new vulnerabilities reported in the Common Vulnerabilities and Exposures (CVE) database in 2024 alone. While deep learning based approaches show promise for vulnerability detection, recent studies reveal critical limitations in terms of accuracy and robustness: accuracy drops by up to 45% on rigorously verified datasets, and performance degrades significantly under simple code modifications. This paper presents LLMxCPG, a novel framework integrating Code Property Graphs (CPG) with Large Language Models (LLM) for robust vulnerability detection. Our CPG-based slice construction technique reduces code size by 67.84 to 90.93% while preserving vulnerability-relevant context. Our approach's ability to provide a more concise and accurate representation of code snippets enables the analysis of larger code segments, including entire projects. This concise representation is a key factor behind the improved detection capabilities of our method, as it can now identify vulnerabilities that span multiple functions. Empirical evaluation demonstrates LLMxCPG's effectiveness across verified datasets, achieving 15-40% improvements in F1-score over state-of-the-art baselines. Moreover, LLMxCPG maintains high performance across function-level and multi-function codebases while exhibiting robust detection efficacy under various syntactic code modifications.

Problem

Research questions and friction points this paper is trying to address.

Detect software vulnerabilities accurately using LLMs and CPGs

Reduce code size while preserving vulnerability context effectively

Improve robustness against code modifications and multi-function vulnerabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Code Property Graphs with LLMs

Reduces code size by 67.84-90.93%

Improves F1-score by 15-40%

🔎 Similar Papers

No similar papers found.