Specification-Guided Vulnerability Detection with Large Language Models

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) struggle with vulnerability detection due to their inability to reason about security specifications—i.e., the expected secure behavioral constraints that code must satisfy—making it difficult to distinguish vulnerable from patched code. This paper proposes VulInstruct, the first framework to systematically incorporate security specifications into LLM-based vulnerability detection. It constructs a two-tier specification knowledge base: (1) general-purpose specifications extracted from high-quality cross-project patches, and (2) domain-specific specifications mined from recurrent violation patterns within the target repository. VulInstruct leverages retrieval-augmented generation and specification-guided prompting to enable specification-aware vulnerability reasoning. On the PrimeVul benchmark, it achieves an F1-score of 45.0% (+32.7%), recall of 37.7% (+50.8%), a 32.3% relative improvement in pairwise evaluation, and uniquely identifies 24.3% of real-world vulnerabilities—2.4× more than baseline methods—while successfully detecting the critical CVE-2025-56538.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have achieved remarkable progress in code understanding tasks. However, they demonstrate limited performance in vulnerability detection and struggle to distinguish vulnerable code from patched code. We argue that LLMs lack understanding of security specifications -- the expectations about how code should behave to remain safe. When code behavior differs from these expectations, it becomes a potential vulnerability. However, such knowledge is rarely explicit in training data, leaving models unable to reason about security flaws. We propose VulInstruct, a specification-guided approach that systematically extracts security specifications from historical vulnerabilities to detect new ones. VulInstruct constructs a specification knowledge base from two perspectives: (i) General specifications from high-quality patches across projects, capturing fundamental safe behaviors; and (ii) Domain-specific specifications from repeated violations in particular repositories relevant to the target code. VulInstruct retrieves relevant past cases and specifications, enabling LLMs to reason about expected safe behaviors rather than relying on surface patterns. We evaluate VulInstruct under strict criteria requiring both correct predictions and valid reasoning. On PrimeVul, VulInstruct achieves 45.0% F1-score (32.7% improvement) and 37.7% recall (50.8% improvement) compared to baselines, while uniquely detecting 24.3% of vulnerabilities -- 2.4x more than any baseline. In pair-wise evaluation, VulInstruct achieves 32.3% relative improvement. VulInstruct also discovered a previously unknown high-severity vulnerability (CVE-2025-56538) in production code, demonstrating practical value for real-world vulnerability discovery. All code and supplementary materials are available at https://github.com/zhuhaopku/VulInstruct-temp.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack security specification understanding for vulnerability detection
Models struggle to distinguish vulnerable code from patched versions
Need systematic approach to extract security expectations from historical vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts security specifications from historical vulnerabilities
Constructs general and domain-specific specification knowledge base
Enables LLMs to reason about expected safe behaviors
🔎 Similar Papers
No similar papers found.
H
Hao Zhu
Peking University, China
J
Jia Li
Tsinghua University, China
C
Cuiyun Gao
Harbin Institute of Technology, China
J
Jiaru Qian
Peking University, China
Yihong Dong
Yihong Dong
Peking University
Code GenerationLarge Language Models
H
Huanyu Liu
Peking University, China
L
Lecheng Wang
Peking University, China
Z
Ziliang Wang
Peking University, China
Xiaolong Hu
Xiaolong Hu
Professor of Optical Engineering, Tianjin University
Nanophotonic devicesSNSPDQuantum photonicsLiDARnanofabrication
Ge Li
Ge Li
Full Professor of Computer Science, Peking University
Program AnalysisProgram GenerationDeep Learning