Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Large language models (LLMs) frequently generate hallucinated content inconsistent with facts or context, a phenomenon whose complex, multifaceted origins remain difficult to diagnose. This paper introduces the *subsequence association causal explanation paradigm*, the first framework to attribute hallucinations to dominant erroneous subsequence associations in the decoder that override faithful ones, and formally establishes that Transformer decoders intrinsically function as subsequence embedding models. Methodologically, we propose a randomized-context probabilistic tracing algorithm, integrated with linear-layer association analysis and empirical validation via alignment with training corpus evidence—yielding interpretable, reproducible hallucination溯源. Experiments across multiple tasks demonstrate that our approach significantly outperforms standard attribution baselines: it precisely localizes causal hallucinatory subsequences, and the identified associations exhibit strong statistical alignment with their actual distributions in the training corpus.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) frequently generate hallucinations-content that deviates from factual accuracy or provided context-posing challenges for diagnosis due to the complex interplay of underlying causes. This paper introduces a subsequence association framework to systematically trace and understand hallucinations. Our key insight is that hallucinations arise when dominant hallucinatory associations outweigh faithful ones. Through theoretical and empirical analyses, we demonstrate that decoder-only transformers effectively function as subsequence embedding models, with linear layers encoding input-output associations. We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts. Experiments show our method outperforms standard attribution techniques in identifying hallucination causes and aligns with evidence from the model's training corpus. This work provides a unified perspective on hallucinations and a robust framework for their tracing and analysis.

Problem

Research questions and friction points this paper is trying to address.

Understanding why LLMs generate hallucinatory content

Tracing hallucinations via subsequence association framework

Identifying causal subsequences causing model hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subsequence association framework traces hallucinations systematically

Decoder-only transformers act as subsequence embedding models

Tracing algorithm identifies causal subsequences via hallucination probabilities

🔎 Similar Papers

Banishing LLM Hallucinations Requires Rethinking Generalization