🤖 AI Summary
Manually authoring security queries for static analysis requires deep expertise in both security and program analysis, posing a significant barrier to adoption. This paper proposes a novel method for automatically generating CodeQL queries from CVE metadata. We design an LLM-based agent framework compliant with the Model Context Protocol (MCP), integrating syntax-guided generation, semantic retrieval via RAG, the Language Server Protocol (LSP) for code understanding, and an execution-feedback loop to enable collaborative query synthesis and iterative refinement. Evaluated on 111 Java projects and 176 CVEs, our approach achieves a 53.4% success rate in generating correct detection queries—substantially outperforming baseline methods (10%). Our key contribution is the first deep integration of feedback-driven LLM agents with production-grade program analysis toolchains, effectively bridging the semantic gap between natural-language CVE descriptions and executable, precise static-analysis queries.
📝 Abstract
Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects. Building upon the Claude Code agent framework, QLCoder synthesizes correct queries that detect the CVE in the vulnerable but not in the patched versions for 53.4% of CVEs. In comparison, using only Claude Code synthesizes 10% correct queries.