Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

📅 2024-10-20

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) frequently exhibit hallucination during training, yet the relationship between training dynamics and erroneous generation mechanisms remains poorly understood. Method: We propose SenD, a novel training protocol that identifies highly sensitive embedding indices via sensitivity analysis and applies deterministic pruning to reduce training variance. We introduce Efficient EigenScore (EES), the first unsupervised hallucination detection metric—achieving 2× computational efficiency—and incorporate embedding-layer index dropout with feature-variance regularization. Results: Evaluated on Pythia models, SenD reduces hallucination rates by up to 40% during inference and significantly improves factual accuracy after fine-tuning on Wikipedia, medical, and LegalBench domains. Our core contribution lies in uncovering intrinsic links between training dynamics and hallucination, and delivering a scalable, annotation-free, lightweight intervention paradigm.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations - outputs that are factually inaccurate or irrelevant to user input - have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M - 12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce Sensitivity Dropout (SenD), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SenD achieves this by deterministically dropping embedding indices with significant variability, referred to as Sensitive Embedding Indices. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore at 2x speed. This efficient metric is integrated into our protocol, allowing SenD to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to Wikipedia, Medical, and LegalBench domains.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Training Process

Error Correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

SenD Training Method

EES Error Detection Tool

Performance Improvement

🔎 Similar Papers

No similar papers found.