🤖 AI Summary
Existing vulnerability detection methods suffer from either insufficient contextual scope—when restricted to single-file or function-level analysis—or excessive noise and computational overhead—when applied to entire repositories. This paper proposes a program-analysis-driven context enhancement framework that synergistically integrates static analysis (control-flow graphs, data-flow analysis, and call graphs) with large language models (LLMs) such as GPT-4, DeepSeek, and CodeLlama. Its core contribution is a novel multi-granularity, behavior-aware context abstraction mechanism: program semantics guide the selection of salient contextual elements while suppressing irrelevant noise; and empirical analysis reveals LLM-specific sensitivities to abstraction granularity, enabling principled identification of optimal abstraction levels. Experiments demonstrate substantial improvements in vulnerability detection performance, achieving an average 18.7% gain in F1-score. Furthermore, the study establishes a strong correlation between model capability and the empirically determined optimal abstraction level.
📝 Abstract
Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods leveraging deep learning and large language models (LLMs) have garnered increasing attention. However, existing approaches often focus on analyzing individual files or functions, which limits their ability to gather sufficient contextual information. Analyzing entire repositories to gather context introduces significant noise and computational overhead. To address these challenges, we propose a context-enhanced vulnerability detection approach that combines program analysis with LLMs. Specifically, we use program analysis to extract contextual information at various levels of abstraction, thereby filtering out irrelevant noise. The abstracted context along with source code are provided to LLM for vulnerability detection. We investigate how different levels of contextual granularity improve LLM-based vulnerability detection performance. Our goal is to strike a balance between providing sufficient detail to accurately capture vulnerabilities and minimizing unnecessary complexity that could hinder model performance. Based on an extensive study using GPT-4, DeepSeek, and CodeLLaMA with various prompting strategies, our key findings includes: (1) incorporating abstracted context significantly enhances vulnerability detection effectiveness; (2) different models benefit from distinct levels of abstraction depending on their code understanding capabilities; and (3) capturing program behavior through program analysis for general LLM-based code analysis tasks can be a direction that requires further attention.