CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

πŸ“… 2025-01-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited structural understanding, accuracy, and efficiency of large language models (LLMs) in detecting security vulnerabilities in long code, this paper proposes a structure-aware soft prompt tuning method. Unlike parameter-efficient fine-tuning, our approach freezes LLM parameters and instead models code structural semantics via type-aware abstract syntax tree (AST) graph embeddings. We further design a lightweight, linear-complexity cross-modal alignment module to efficiently fuse graph-structured representations with textual semantics. Our key contributions are the first introduction of type-aware graph representation learning for code vulnerability detection and a low-overhead graph–text alignment mechanism that preserves semantic richness while significantly improving computational efficiency. Evaluated on the DiverseVul benchmark, our method achieves a 3.5-percentage-point average accuracy improvement over state-of-the-art approaches and demonstrates strong robustness and high detection capability on long-code samples.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have been proposed as powerful tools for detecting software vulnerabilities, where task-specific fine-tuning is typically employed to provide vulnerability-specific knowledge to the LLMs for this purpose. However, traditional full-parameter fine-tuning is inefficient for modern, complex LLMs, which contain billions of parameters. Soft prompt tuning has been suggested as a more efficient alternative for fine-tuning LLMs in general cases. However, pure soft prompt tuning treats source code as plain text, losing structural information inherent in source code. Meanwhile, graph-enhanced soft prompt tuning methods, which aim to address this issue, are unable to preserve the rich semantic information within code graphs, as they are primarily designed for general graph-related tasks and focus more on adjacency information. They also fail to ensure computational efficiency while accounting for graph-text interactions. This paper, therefore, introduces a new code graph-enhanced, structure-aware soft prompt tuning method for vulnerability detection, referred to as CGP-Tuning. It employs innovative type-aware embeddings to capture the rich semantic information within code graphs, along with a novel and efficient cross-modal alignment module that achieves linear computational cost while incorporating graph-text interactions. The proposed CGP-Tuning is evaluated on the latest DiverseVul dataset and the most recent open-source code LLMs, CodeLlama and CodeGemma. Experimental results demonstrate that CGP-Tuning outperforms the best state-of-the-art method by an average of 3.5 percentage points in accuracy, without compromising its vulnerability detection capabilities for long source code.
Problem

Research questions and friction points this paper is trying to address.

Code Understanding
Vulnerability Detection
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

CGP-Tuning
Code Semantic Analysis
Vulnerability Detection Accuracy
πŸ”Ž Similar Papers
No similar papers found.