LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently exhibit hallucinations in repository-scale code generation, severely compromising correctness and reliability. This work first establishes a fine-grained taxonomy of code hallucinations grounded in real-world development scenarios, systematically identifying and empirically validating four primary causes: contextual misinterpretation, erroneous API inference, logical discontinuities, and dependency omission. We propose a general, lightweight retrieval-augmented generation (RAG) mitigation method that integrates relevant code context via semantic retrieval. Extensive evaluation across six mainstream LLMs demonstrates a statistically significant average reduction in hallucination rates, with consistent performance across models. Our findings are validated through human annotation and multi-model comparative experiments, and we release a fully reproducible, open-source toolkit. This work provides both a theoretical framework and a practical, deployable solution for enhancing the reliability of repository-scale code generation.

Technology Category

Application Category

📝 Abstract
Code generation aims to automatically generate code from input requirements, significantly enhancing development efficiency. Recent large language models (LLMs) based approaches have shown promising results and revolutionized code generation task. Despite the promising performance, LLMs often generate contents with hallucinations, especially for the code generation scenario requiring the handling of complex contextual dependencies in practical development process. Although previous study has analyzed hallucinations in LLM-powered code generation, the study is limited to standalone function generation. In this paper, we conduct an empirical study to study the phenomena, mechanism, and mitigation of LLM hallucinations within more practical and complex development contexts in repository-level generation scenario. First, we manually examine the code generation results from six mainstream LLMs to establish a hallucination taxonomy of LLM-generated code. Next, we elaborate on the phenomenon of hallucinations, analyze their distribution across different models. We then analyze causes of hallucinations and identify four potential factors contributing to hallucinations. Finally, we propose an RAG-based mitigation method, which demonstrates consistent effectiveness in all studied LLMs. The replication package including code, data, and experimental results is available at https://github.com/DeepSoftwareAnalytics/LLMCodingHallucination
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Code Generation
Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Code Generation Errors
Systematic Error Classification
🔎 Similar Papers
No similar papers found.