Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study investigates why large language models (LLMs) exhibit factual hallucinations—i.e., erroneous outputs about known facts—when fine-tuned to incorporate new knowledge. To systematically diagnose the root cause, we construct the fine-grained, controllable Biography-Reasoning dataset, revealing that “high unfamiliarity” of newly introduced knowledge is the primary driver of hallucination, with effects transferable across knowledge types. Building on this insight, we propose KnownPatch: a lightweight, interpretable intervention that injects a small set of verified, pre-existing knowledge samples during the late stage of fine-tuning. Experiments demonstrate that KnownPatch significantly suppresses hallucinations triggered by new knowledge, restores model attention to critical entities, and simultaneously improves both accuracy and factual consistency on knowledge-intensive question answering and reasoning tasks—thereby validating both mechanistic interpretability and empirical efficacy.

Technology Category

Application Category

📝 Abstract

Previous studies show that introducing new knowledge during large language models (LLMs) fine-tuning can lead to the generation of erroneous output when tested on known information, thereby triggering factual hallucinations. However, existing studies have not deeply investigated the specific manifestations and underlying mechanisms of these hallucinations. Our work addresses this gap by designing a controlled dataset Biography-Reasoning, and conducting a fine-grained analysis across multiple knowledge types and two task types, including knowledge question answering (QA) and knowledge reasoning tasks. We find that when fine-tuned on a dataset in which a specific knowledge type consists entirely of new knowledge, LLMs exhibit significantly increased hallucination tendencies. This suggests that the high unfamiliarity of a particular knowledge type, rather than the overall proportion of new knowledge, is a stronger driver of hallucinations, and these tendencies can even affect other knowledge types in QA tasks. To mitigate such factual hallucinations, we propose KnownPatch, which patches a small number of known knowledge samples in the later stages of training, effectively alleviating new-knowledge-induced hallucinations. Through attention analysis, we find that learning new knowledge reduces the model's attention to key entities in the question, thus causing excessive focus on the surrounding context, which may increase the risk of hallucination. Moreover, the attention pattern can propagate to similar contexts, facilitating the spread of hallucinations to textually similar questions. Our method effectively mitigates the disruption of new knowledge learning to the model's attention on key entities, accompanied by improved performance.

Problem

Research questions and friction points this paper is trying to address.

Analyzes how new knowledge in fine-tuning causes factual hallucinations in LLMs

Identifies knowledge type unfamiliarity as primary driver of hallucination tendencies

Proposes KnownPatch method to mitigate hallucinations via attention mechanism repair

Innovation

Methods, ideas, or system contributions that make the work stand out.

KnownPatch adds known knowledge in late training stages

Method reduces hallucinations by adjusting attention patterns

Technique maintains focus on key entities in questions

🔎 Similar Papers

No similar papers found.