An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Large language models (LLMs) frequently generate library hallucinations—invoking non-existent library functions—thereby severely compromising code correctness. This work presents the first systematic evaluation of static analysis tools in detecting such hallucinations, conducting large-scale experiments that integrate multiple static analyzers, prominent LLMs, and natural-language-to-code datasets, complemented by manual analysis. The study reveals that static analysis can detect 14%–85% of library hallucinations, albeit with false positive rates ranging from 16% to 70%. Crucially, the theoretical upper bound for detectability is limited to 48.5%–77%, exposing inherent limitations of static analysis for this task. By establishing the first empirical benchmark and quantifying fundamental detection boundaries, this research delineates clear directions for future work on mitigating library hallucinations in code generation.

Technology Category

Application Category

📝 Abstract

Despite extensive research, Large Language Models continue to hallucinate when generating code, particularly when using libraries. On NL-to-code benchmarks that require library use, we find that LLMs generate code that uses non-existent library features in 8.1-40% of responses.One intuitive approach for detection and mitigation of hallucinations is static analysis. In this paper, we analyse the potential of static analysis tools, both in terms of what they can solve and what they cannot. We find that static analysis tools can detect 16-70% of all errors, and 14-85% of library hallucinations, with performance varying by LLM and dataset. Through manual analysis, we identify cases a static method could not plausibly catch, which gives an upper bound on their potential from 48.5% to 77%. Overall, we show that static analysis methods are cheap method for addressing some forms of hallucination, and we quantify how far short of solving the problem they will always be.

Problem

Research questions and friction points this paper is trying to address.

code hallucination

library usage

large language models

static analysis

NL-to-code

Innovation

Methods, ideas, or system contributions that make the work stand out.

static analysis

code hallucination

large language models