🤖 AI Summary
Existing static analysis tools (e.g., CodeQL) suffer from low vulnerability recall in open-source projects due to limited contextual understanding and imprecise vulnerability pattern generalization. Method: This paper proposes QLPro, the first systematic framework integrating large language models (LLMs) with static code analysis—leveraging CodeQL as its underlying engine while employing LLMs to enhance context-aware code comprehension and vulnerability reasoning. A dedicated Java benchmark dataset, JavaTest, is constructed for rigorous evaluation. Contribution/Results: On JavaTest, QLPro detects 41 known vulnerabilities—70.8% more than CodeQL—and discovers six previously unknown vulnerabilities, including two confirmed 0-day vulnerabilities. The framework significantly advances deep, systematic vulnerability discovery in large-scale codebases and establishes a reproducible methodology and empirical benchmark for LLM-augmented static analysis.
📝 Abstract
We introduce QLPro, a vulnerability detection framework that systematically integrates LLMs and static analysis tools to enable comprehensive vulnerability detection across entire open-source projects.We constructed a new dataset, JavaTest, comprising 10 open-source projects from GitHub with 62 confirmed vulnerabilities. CodeQL, a state-of-the-art static analysis tool, detected only 24 of these vulnerabilities while QLPro detected 41. Furthermore, QLPro discovered 6 previously unknown vulnerabilities, 2 of which have been confirmed as 0-days.