Detecting and Correcting Hallucinations in LLM-Generated Code via Deterministic AST Analysis

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses knowledge-conflict hallucinations (KCHs) in code generated by large language models—such as the use of non-existent API parameters—which frequently cause runtime errors and evade detection by conventional tools. The authors propose the first fully deterministic, non-execution-based post-processing framework that combines abstract syntax tree (AST) parsing with a dynamically constructed library knowledge base. By leveraging static analysis and a rule engine, the method achieves high-precision detection and automatic repair of KCHs at both the API and identifier levels, without relying on probabilistic models or LLM re-generation. Evaluated on 200 handcrafted Python code snippets, the approach attains 100% precision, 87.6% recall (F1 = 0.934), and successfully corrects 77.0% of hallucinated errors.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) for code generation boost productivity but frequently introduce Knowledge Conflicting Hallucinations (KCHs), subtle, semantic errors, such as non-existent API parameters, that evade linters and cause runtime failures. Existing mitigations like constrained decoding or non-deterministic LLM-in-the-loop repair are often unreliable for these errors. This paper investigates whether a deterministic, static-analysis framework can reliably detect \textit{and} auto-correct KCHs. We propose a post-processing framework that parses generated code into an Abstract Syntax Tree (AST) and validates it against a dynamically-generated Knowledge Base (KB) built via library introspection. This non-executing approach uses deterministic rules to find and fix both API and identifier-level conflicts. On a manually-curated dataset of 200 Python snippets, our framework detected KCHs with 100\% precision and 87.6\% recall (0.934 F1-score), and successfully auto-corrected 77.0\% of all identified hallucinations. Our findings demonstrate that this deterministic post-processing approach is a viable and reliable alternative to probabilistic repair, offering a clear path toward trustworthy code generation.

Problem

Research questions and friction points this paper is trying to address.

Hallucinations

Code Generation

Large Language Models

Knowledge Conflicting Hallucinations

Semantic Errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic AST Analysis

Knowledge Conflicting Hallucinations

Static Code Analysis