🤖 AI Summary
Existing hallucination detection methods in retrieval-augmented generation (RAG) lack fine-grained attribution mechanisms to pinpoint the root causes of hallucinations. Method: We propose SPAD, the first approach to decompose the conditional probability of each generated token into seven interpretable, modular contributions—namely, query, retrieved context, previously generated sequence, current token, embedding initialization, layer normalization, and feed-forward network—and aggregate these contributions by part-of-speech (POS) tags to identify syntactic-category-specific anomalies in component-level influence. Contribution/Results: SPAD enables syntax-aware localization of hallucination origins. Evaluated across multiple RAG benchmarks, it significantly outperforms state-of-the-art binary conflict-detection methods in both accuracy and interpretability, achieving new SOTA performance. Crucially, SPAD requires no additional training, is model-agnostic, and exhibits strong deployment efficiency.
📝 Abstract
Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge (stored in FFNs) and retrieved context. However, this perspective is incomplete, failing to account for the impact of other components in the generative process, such as the user query, previously generated tokens, the current token itself, and the final LayerNorm adjustment. To address this, we introduce SPAD. First, we mathematically attribute each token's probability into seven distinct sources: Query, RAG, Past, Current Token, FFN, Final LayerNorm, and Initial Embedding. This attribution quantifies how each source contributes to the generation of the current token. Then, we aggregate these scores by POS tags to quantify how different components drive specific linguistic categories. By identifying anomalies, such as Nouns relying on Final LayerNorm, SPAD effectively detects hallucinations. Extensive experiments demonstrate that SPAD achieves state-of-the-art performance