🤖 AI Summary
To address the low efficiency, narrow coverage, and poor generalizability of manually written checkers in static analysis—particularly to previously unseen defect patterns—this paper proposes the first LLM-driven automated checker synthesis framework guided by historical patch knowledge. Methodologically, it introduces a multi-stage synthesis pipeline integrating patch-informed prompt engineering, formal correctness verification, and automated false-positive refinement, enabling verifiable, iterative, and traceable checker generation. Empirical evaluation on the Linux kernel demonstrates that the synthesized checkers achieve high precision and strong generalization: they identified 70 previously unknown vulnerabilities/defects, of which 56 were confirmed, 41 fixed, and 11 assigned CVE identifiers. This work significantly extends the capability boundary and practical utility of static analysis.
📝 Abstract
Static analysis is a powerful technique for bug detection in critical systems like operating system kernels. However, designing and implementing static analyzers is challenging, time-consuming, and typically limited to predefined bug patterns. While large language models (LLMs) have shown promise for static analysis, directly applying them to scan large codebases remains impractical due to computational constraints and contextual limitations. We present KNighter, the first approach that unlocks practical LLM-based static analysis by automatically synthesizing static analyzers from historical bug patterns. Rather than using LLMs to directly analyze massive codebases, our key insight is leveraging LLMs to generate specialized static analyzers guided by historical patch knowledge. KNighter implements this vision through a multi-stage synthesis pipeline that validates checker correctness against original patches and employs an automated refinement process to iteratively reduce false positives. Our evaluation on the Linux kernel demonstrates that KNighter generates high-precision checkers capable of detecting diverse bug patterns overlooked by existing human-written analyzers. To date, KNighter-synthesized checkers have discovered 70 new bugs/vulnerabilities in the Linux kernel, with 56 confirmed and 41 already fixed. 11 of these findings have been assigned CVE numbers. This work establishes an entirely new paradigm for scalable, reliable, and traceable LLM-based static analysis for real-world systems via checker synthesis.