Longitudinal Analyses of SAST Tools: A CodeQL Case Study

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of the long-term effectiveness, actionability, and stability of Static Application Security Testing (SAST) tools in open-source ecosystems. The authors propose and implement the first longitudinal evaluation framework for SAST tools, conducting a large-scale temporal analysis across 114 CodeQL versions, 1,622 repositories, and 3,993 CVEs. Their findings reveal that CodeQL detected 171 CVEs, with 83 identifiable prior to patching; half of the generated alerts exhibited high file-level localization precision. However, 21 CVEs became undetectable due to tool updates, exposing detection blind spots introduced during version evolution. This work establishes both a methodological foundation and empirical evidence for the continuous reliability assessment of SAST tools.

📝 Abstract

Open-source software (OSS) pipelines rely on automated static analysis tools to prevent the introduction of vulnerabilities in code. However, there is limited understanding of the efficacy of these tools across the OSS ecosystem over time. In this paper, we introduce a novel method to evaluate static application security testing (SAST) tools through longitudinal measurements and perform the largest academic study of CodeQL -- the most prevalent static analysis tool from GitHub -- on OSS codebases. We apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs from 1622 repositories to measure key properties of the tool, culminating in more than 20 billion lines of code analyzed. First, we measure its effectiveness, i.e., its ability to detect vulnerabilities before they are fixed. Then, we determine whether these detections were actionable through two measures of the distance between findings and vulnerability location either over the entire codebase or within the vulnerable file. Finally, we study the stability of CodeQL by examining how vulnerability detections hold across versions and the evolution of CodeQL on the accuracy-precision trade-off. We find that CodeQL identifies a total of 171 CVEs, and that for 83 of them, a CodeQL version prior to the fix could detect it. Such detections are in general actionable if findings are triaged across files, as for 50% of the 171 detections, more than 50% of findings in the vulnerable file are located in the vulnerable location. Finally, we show that CVE detections are not monotonic across versions as 21 CVEs were no longer detected following a version change and 17 that were never redetected. Our study shows that using SAST tools is a matter of best practice as they prevent numerous vulnerabilities from being introduced, but that developers should be aware of changes that may leave blind spots in detections upon updates of the tool.

Problem

Research questions and friction points this paper is trying to address.

SAST tools

longitudinal analysis

CodeQL

vulnerability detection

open-source software

Innovation

Methods, ideas, or system contributions that make the work stand out.

longitudinal analysis

SAST evaluation

CodeQL