Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates why large language models (LLMs) exhibit factual hallucinations—i.e., erroneous outputs about known facts—when fine-tuned to incorporate new knowledge. To systematically diagnose the root cause, we construct the fine-grained, controllable Biography-Reasoning dataset, revealing that “high unfamiliarity” of newly introduced knowledge is the primary driver of hallucination, with effects transferable across knowledge types. Building on this insight, we propose KnownPatch: a lightweight, interpretable intervention that injects a small set of verified, pre-existing knowledge samples during the late stage of fine-tuning. Experiments demonstrate that KnownPatch significantly suppresses hallucinations triggered by new knowledge, restores model attention to critical entities, and simultaneously improves both accuracy and factual consistency on knowledge-intensive question answering and reasoning tasks—thereby validating both mechanistic interpretability and empirical efficacy.

Technology Category

Application Category

📝 Abstract
Previous studies show that introducing new knowledge during large language models (LLMs) fine-tuning can lead to the generation of erroneous output when tested on known information, thereby triggering factual hallucinations. However, existing studies have not deeply investigated the specific manifestations and underlying mechanisms of these hallucinations. Our work addresses this gap by designing a controlled dataset Biography-Reasoning, and conducting a fine-grained analysis across multiple knowledge types and two task types, including knowledge question answering (QA) and knowledge reasoning tasks. We find that when fine-tuned on a dataset in which a specific knowledge type consists entirely of new knowledge, LLMs exhibit significantly increased hallucination tendencies. This suggests that the high unfamiliarity of a particular knowledge type, rather than the overall proportion of new knowledge, is a stronger driver of hallucinations, and these tendencies can even affect other knowledge types in QA tasks. To mitigate such factual hallucinations, we propose KnownPatch, which patches a small number of known knowledge samples in the later stages of training, effectively alleviating new-knowledge-induced hallucinations. Through attention analysis, we find that learning new knowledge reduces the model's attention to key entities in the question, thus causing excessive focus on the surrounding context, which may increase the risk of hallucination. Moreover, the attention pattern can propagate to similar contexts, facilitating the spread of hallucinations to textually similar questions. Our method effectively mitigates the disruption of new knowledge learning to the model's attention on key entities, accompanied by improved performance.
Problem

Research questions and friction points this paper is trying to address.

Analyzes how new knowledge in fine-tuning causes factual hallucinations in LLMs
Identifies knowledge type unfamiliarity as primary driver of hallucination tendencies
Proposes KnownPatch method to mitigate hallucinations via attention mechanism repair
Innovation

Methods, ideas, or system contributions that make the work stand out.

KnownPatch adds known knowledge in late training stages
Method reduces hallucinations by adjusting attention patterns
Technique maintains focus on key entities in questions
🔎 Similar Papers
No similar papers found.
R
Renfei Dang
National Key Laboratory for Novel Software Technology, Nanjing University, China
P
Peng Hu
National Key Laboratory for Novel Software Technology, Nanjing University, China
Changjiang Gao
Changjiang Gao
PhD student, Nanjing University
Natural Language Processing
Shujian Huang
Shujian Huang
School of Computer Science, Nanjing University
Natural Language ProcessingMachine TranslationMultilingualismLarge Language Models