Banishing LLM Hallucinations Requires Rethinking Generalization

📅 2024-06-25
🏛️ arXiv.org
📈 Citations: 14
Influential: 1
📄 PDF
🤖 AI Summary
This paper challenges the conventional view that LLM hallucinations arise from a trade-off between creativity and factual accuracy, arguing instead that hallucination is fundamentally a failure of generalization under the training loss threshold. Method: We propose MoME (Mixture of Memory Experts), a hybrid architecture leveraging massive, retrievable expert memories; it enables dynamic sparse retrieval to realize “memory-as-fact” and “retrieval-as-reasoning.” Contribution/Results: We provide the first theoretical and empirical refutation of dominant attribution theories, establishing a causal link between hallucination and training loss thresholds. Building upon this insight, we develop Lamini-1—the first hallucination-mitigated LLM—which significantly reduces hallucination rates across multiple factuality benchmarks. Our results validate a new paradigm: replacing internal parametric knowledge with scalable, external memory systems for robust factual grounding.

Technology Category

Application Category

📝 Abstract
Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.
Problem

Research questions and friction points this paper is trying to address.

LLMs hallucinate despite external knowledge grounding
Memorizing random data reveals fundamental generalization flaws
Training loss threshold causes neural networks to hallucinate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Memory Experts (MoME) architecture
Dynamically retrieved millions of memory experts
Stores facts in massive specialized memory system
🔎 Similar Papers
No similar papers found.
J
Johnny Li
Lamini
S
Saksham Consul
Lamini
E
Eda Zhou
Lamini
J
James Wong
Lamini
Naila Farooqui
Naila Farooqui
Lamini
Y
Yuxin Ye
Lamini
N
Nithyashree Manohar
Lamini
Z
Zhuxiaona Wei
Lamini
Tian Wu
Tian Wu
Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences
Big DataData AnalysisForecastingMachine LearningEnergy Economics
B
Ben Echols
Lamini
Sharon Zhou
Sharon Zhou
Lamini
Gregory Diamos
Gregory Diamos
Lamini