QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing static analysis tools (e.g., CodeQL) suffer from low vulnerability recall in open-source projects due to limited contextual understanding and imprecise vulnerability pattern generalization. Method: This paper proposes QLPro, the first systematic framework integrating large language models (LLMs) with static code analysis—leveraging CodeQL as its underlying engine while employing LLMs to enhance context-aware code comprehension and vulnerability reasoning. A dedicated Java benchmark dataset, JavaTest, is constructed for rigorous evaluation. Contribution/Results: On JavaTest, QLPro detects 41 known vulnerabilities—70.8% more than CodeQL—and discovers six previously unknown vulnerabilities, including two confirmed 0-day vulnerabilities. The framework significantly advances deep, systematic vulnerability discovery in large-scale codebases and establishes a reproducible methodology and empirical benchmark for LLM-augmented static analysis.

Technology Category

Application Category

📝 Abstract
We introduce QLPro, a vulnerability detection framework that systematically integrates LLMs and static analysis tools to enable comprehensive vulnerability detection across entire open-source projects.We constructed a new dataset, JavaTest, comprising 10 open-source projects from GitHub with 62 confirmed vulnerabilities. CodeQL, a state-of-the-art static analysis tool, detected only 24 of these vulnerabilities while QLPro detected 41. Furthermore, QLPro discovered 6 previously unknown vulnerabilities, 2 of which have been confirmed as 0-days.
Problem

Research questions and friction points this paper is trying to address.

Integrating LLMs and static analysis for vulnerability detection
Improving detection rates compared to standalone static analysis tools
Discovering previously unknown zero-day vulnerabilities in open-source projects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs with static code analysis
Detects more vulnerabilities than CodeQL
Discovers previously unknown 0-day vulnerabilities
🔎 Similar Papers
J
Junze Hu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
X
Xiangyu Jin
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Y
Yizhe Zeng
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Y
Yuling Liu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Y
Yunpeng Li
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
D
Dan Du
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
K
Kaiyu Xie
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Hongsong Zhu
Hongsong Zhu
institute of information Engineering, Chinese Academy of Sciences
cybersecurityinternet measurement