Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion language models (dLLMs) lack effective self-evaluation methods due to their non-sequential, bidirectional generation mechanism, making it challenging to assess output quality. To address this, this work proposes DiSE, a novel approach that introduces sequence regeneration probability within the full contextual window as a confidence metric for efficient self-evaluation of generated outputs. DiSE not only enables uncertainty quantification but also demonstrates strong correlations with semantic coherence and answer accuracy. Furthermore, it establishes a unified framework capable of adaptively controlling generation length. Experimental results show that DiSE excels in likelihood-based evaluation and flexible-length generation tasks, significantly enhancing the controllability and reliability of dLLMs.

Technology Category

Application Category

📝 Abstract
Diffusion large language models (dLLMs) have recently attracted significant attention for their ability to enhance diversity, controllability, and parallelism. However, their non-sequential, bidirectionally masked generation makes quality assessment difficult, underscoring the need for effective self-evaluation. In this work, we propose DiSE, a simple yet effective self-evaluation confidence quantification method for dLLMs. DiSE quantifies confidence by computing the probability of regenerating the tokens in the entire generated sequence, given the full context. This method enables more efficient and reliable quality assessment by leveraging token regeneration probabilities, facilitating both likelihood estimation and robust uncertainty quantification. Building upon DiSE, we further introduce a flexible-length generation framework, which adaptively controls the sequence length based on the model's self-assessment of its own output. We analyze and validate the feasibility of DiSE from the perspective of dLLM generalization, and empirically demonstrate that DiSE is positively correlated with both semantic coherence and answer accuracy. Extensive experiments on likelihood evaluation, uncertainty quantification, and flexible-length generation further confirm the effectiveness of the proposed DiSE.
Problem

Research questions and friction points this paper is trying to address.

diffusion language models
self-evaluation
quality assessment
non-sequential generation
uncertainty quantification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Language Models
Self-Evaluation
Sequence Regeneration
Uncertainty Quantification
Flexible-Length Generation