SPAD: Seven-Source Token Probability Attribution with Syntactic Aggregation for Detecting Hallucinations in RAG

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing hallucination detection methods in retrieval-augmented generation (RAG) lack fine-grained attribution mechanisms to pinpoint the root causes of hallucinations. Method: We propose SPAD, the first approach to decompose the conditional probability of each generated token into seven interpretable, modular contributions—namely, query, retrieved context, previously generated sequence, current token, embedding initialization, layer normalization, and feed-forward network—and aggregate these contributions by part-of-speech (POS) tags to identify syntactic-category-specific anomalies in component-level influence. Contribution/Results: SPAD enables syntax-aware localization of hallucination origins. Evaluated across multiple RAG benchmarks, it significantly outperforms state-of-the-art binary conflict-detection methods in both accuracy and interpretability, achieving new SOTA performance. Crucially, SPAD requires no additional training, is model-agnostic, and exhibits strong deployment efficiency.

Technology Category

Application Category

📝 Abstract
Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge (stored in FFNs) and retrieved context. However, this perspective is incomplete, failing to account for the impact of other components in the generative process, such as the user query, previously generated tokens, the current token itself, and the final LayerNorm adjustment. To address this, we introduce SPAD. First, we mathematically attribute each token's probability into seven distinct sources: Query, RAG, Past, Current Token, FFN, Final LayerNorm, and Initial Embedding. This attribution quantifies how each source contributes to the generation of the current token. Then, we aggregate these scores by POS tags to quantify how different components drive specific linguistic categories. By identifying anomalies, such as Nouns relying on Final LayerNorm, SPAD effectively detects hallucinations. Extensive experiments demonstrate that SPAD achieves state-of-the-art performance
Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in Retrieval-Augmented Generation (RAG) systems
Attributes token probability to seven sources beyond binary conflict
Aggregates scores by POS tags to identify linguistic anomalies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attributing token probability to seven distinct sources
Aggregating scores by POS tags for linguistic analysis
Detecting hallucinations via anomalies in source contributions
P
Pengqian Lu
Australian Artificial Intelligence Institute (AAII), University of Technology Sydney, Ultimo, NSW 2007, Australia
J
Jie Lu
Australian Artificial Intelligence Institute (AAII), University of Technology Sydney, Ultimo, NSW 2007, Australia
A
Anjin Liu
Australian Artificial Intelligence Institute (AAII), University of Technology Sydney, Ultimo, NSW 2007, Australia
Guangquan Zhang
Guangquan Zhang
University of Technology Sydney, Australia
fuzzy sets and systemsmachine learningdecision support systems