Detecting and Correcting Hallucinations in LLM-Generated Code via Deterministic AST Analysis

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses knowledge-conflict hallucinations (KCHs) in code generated by large language models—such as the use of non-existent API parameters—which frequently cause runtime errors and evade detection by conventional tools. The authors propose the first fully deterministic, non-execution-based post-processing framework that combines abstract syntax tree (AST) parsing with a dynamically constructed library knowledge base. By leveraging static analysis and a rule engine, the method achieves high-precision detection and automatic repair of KCHs at both the API and identifier levels, without relying on probabilistic models or LLM re-generation. Evaluated on 200 handcrafted Python code snippets, the approach attains 100% precision, 87.6% recall (F1 = 0.934), and successfully corrects 77.0% of hallucinated errors.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) for code generation boost productivity but frequently introduce Knowledge Conflicting Hallucinations (KCHs), subtle, semantic errors, such as non-existent API parameters, that evade linters and cause runtime failures. Existing mitigations like constrained decoding or non-deterministic LLM-in-the-loop repair are often unreliable for these errors. This paper investigates whether a deterministic, static-analysis framework can reliably detect \textit{and} auto-correct KCHs. We propose a post-processing framework that parses generated code into an Abstract Syntax Tree (AST) and validates it against a dynamically-generated Knowledge Base (KB) built via library introspection. This non-executing approach uses deterministic rules to find and fix both API and identifier-level conflicts. On a manually-curated dataset of 200 Python snippets, our framework detected KCHs with 100\% precision and 87.6\% recall (0.934 F1-score), and successfully auto-corrected 77.0\% of all identified hallucinations. Our findings demonstrate that this deterministic post-processing approach is a viable and reliable alternative to probabilistic repair, offering a clear path toward trustworthy code generation.
Problem

Research questions and friction points this paper is trying to address.

Hallucinations
Code Generation
Large Language Models
Knowledge Conflicting Hallucinations
Semantic Errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic AST Analysis
Knowledge Conflicting Hallucinations
Static Code Analysis
Knowledge Base Construction
LLM Code Correction
🔎 Similar Papers
No similar papers found.
D
Dipin Khati
William & Mary
D
Daniel Rodríguez-Cárdenas
William & Mary
P
Paul Pantzer
William & Mary
Denys Poshyvanyk
Denys Poshyvanyk
Chancellor Professor of Computer Science, William & Mary
software engineeringsoftware analyticssoftware evolutionsoftware maintenanceprogram