Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Existing masked diffusion language models (DLMs) struggle to correct erroneous tokens post-generation due to their inability to dynamically identify and regenerate low-confidence segments. To address this, we propose RemeDi, a re-masking–enabled diffusion language model that introduces a token-level confidence prediction module to autonomously detect and re-mask unreliable tokens during generation, enabling iterative diffusion-based reconstruction for self-reflection and correction. RemeDi is trained via a unified framework integrating masked diffusion modeling, confidence-aware supervised fine-tuning, and reinforcement learning optimization. On multiple text generation benchmarks, RemeDi significantly outperforms existing open-source DLMs, achieving state-of-the-art performance in accuracy, fluency, and error correction. Notably, it is the first diffusion-based model to support controllable, iterative text refinement—marking a foundational advance toward self-correcting generative language modeling.

Technology Category

Application Category

📝 Abstract

Mask-based Diffusion Language Models (DLMs) struggle to revise incorrect tokens: once a token is generated, it typically remains fixed. The key challenge is to identify potential errors in the inputs. In this paper, we propose emph{underline{Rem}asking-underline{e}nabled underline{Di}ffusion Language Model (RemeDi}, a mask-based DLM that introduces emph{remasking} as another fundamental mechanism, enabling more flexible text refinement in diffusion-based text generation. To achieve this, RemeDi jointly predicts token distributions and per-token confidence scores at each step. The confidence scores determine which tokens to be unmasked after the current step, allowing the model to identify tokens with low quality and remask them. These remasked tokens can be resampled with richer context in subsequent steps. We design a remask-aware pipeline to train this ability, including supervised fine-tuning which teaches the model to detect and remask incorrect tokens in addition to predict mask tokens, and reinforcement learning which optimizes full generation trajectories toward higher rewards. Experiments show that RemeDi achieves the state-of-the-art results among open-source DLMs on multiple datasets.

Problem

Research questions and friction points this paper is trying to address.

Addresses token revision limitations in diffusion language models

Proposes remasking mechanism for flexible text refinement

Enables identification and resampling of low-quality tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces remasking mechanism for flexible text refinement

Jointly predicts token distributions and confidence scores

Uses supervised fine-tuning and reinforcement learning training

🔎 Similar Papers

The Remarkable Robustness of LLMs: Stages of Inference?