SPAD: Seven-Source Token Probability Attribution with Syntactic Aggregation for Detecting Hallucinations in RAG

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing hallucination detection methods in retrieval-augmented generation (RAG) lack fine-grained attribution mechanisms to pinpoint the root causes of hallucinations. Method: We propose SPAD, the first approach to decompose the conditional probability of each generated token into seven interpretable, modular contributions—namely, query, retrieved context, previously generated sequence, current token, embedding initialization, layer normalization, and feed-forward network—and aggregate these contributions by part-of-speech (POS) tags to identify syntactic-category-specific anomalies in component-level influence. Contribution/Results: SPAD enables syntax-aware localization of hallucination origins. Evaluated across multiple RAG benchmarks, it significantly outperforms state-of-the-art binary conflict-detection methods in both accuracy and interpretability, achieving new SOTA performance. Crucially, SPAD requires no additional training, is model-agnostic, and exhibits strong deployment efficiency.

Technology Category

Application Category

📝 Abstract

Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge (stored in FFNs) and retrieved context. However, this perspective is incomplete, failing to account for the impact of other components in the generative process, such as the user query, previously generated tokens, the current token itself, and the final LayerNorm adjustment. To address this, we introduce SPAD. First, we mathematically attribute each token's probability into seven distinct sources: Query, RAG, Past, Current Token, FFN, Final LayerNorm, and Initial Embedding. This attribution quantifies how each source contributes to the generation of the current token. Then, we aggregate these scores by POS tags to quantify how different components drive specific linguistic categories. By identifying anomalies, such as Nouns relying on Final LayerNorm, SPAD effectively detects hallucinations. Extensive experiments demonstrate that SPAD achieves state-of-the-art performance

Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in Retrieval-Augmented Generation (RAG) systems

Attributes token probability to seven sources beyond binary conflict

Aggregates scores by POS tags to identify linguistic anomalies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attributing token probability to seven distinct sources

Aggregating scores by POS tags for linguistic analysis

Detecting hallucinations via anomalies in source contributions

🔎 Similar Papers

LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation