TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing hallucination detection methods—designed for autoregressive LLMs (AR-LLMs) and relying on single-step generation signals—are ill-suited for diffusion large language models (D-LLMs), whose hallucinations emerge progressively across multi-step denoising. Method: We propose the first trajectory-aware hallucination detection paradigm for D-LLMs, modeling the denoising process as a conditional action trajectory. Our approach introduces a sub-trajectory maximal informativeness analysis framework that quantifies each step’s contribution to response refinement, thereby identifying hallucination-critical signals. It jointly integrates conditional response prediction with sub-trajectory saliency modeling. Contribution/Results: Evaluated on multiple open-source D-LLMs, our method achieves an average 15.2% improvement in AUROC over strong baselines. It provides the first interpretable, trajectory-aware hallucination detection framework for trustworthy D-LLM evaluation.

Technology Category

Application Category

📝 Abstract

Diffusion large language models (D-LLMs) have recently emerged as a promising alternative to auto-regressive LLMs (AR-LLMs). However, the hallucination problem in D-LLMs remains underexplored, limiting their reliability in real-world applications. Existing hallucination detection methods are designed for AR-LLMs and rely on signals from single-step generation, making them ill-suited for D-LLMs where hallucination signals often emerge throughout the multi-step denoising process. To bridge this gap, we propose TraceDet, a novel framework that explicitly leverages the intermediate denoising steps of D-LLMs for hallucination detection. TraceDet models the denoising process as an action trace, with each action defined as the model's prediction over the cleaned response, conditioned on the previous intermediate output. By identifying the sub-trace that is maximally informative to the hallucinated responses, TraceDet leverages the key hallucination signals in the multi-step denoising process of D-LLMs for hallucination detection. Extensive experiments on various open source D-LLMs demonstrate that TraceDet consistently improves hallucination detection, achieving an average gain in AUROC of 15.2% compared to baselines.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in diffusion large language models

Addressing limitations of single-step AR-LLM detection methods

Leveraging multi-step denoising traces for hallucination signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages intermediate denoising steps for detection

Models denoising process as an action trace

Identifies maximally informative sub-trace for hallucinations

🔎 Similar Papers

No similar papers found.

Authors to Follow