Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing masked diffusion language models (DLMs) struggle to correct erroneous tokens post-generation due to their inability to dynamically identify and regenerate low-confidence segments. To address this, we propose RemeDi, a re-masking–enabled diffusion language model that introduces a token-level confidence prediction module to autonomously detect and re-mask unreliable tokens during generation, enabling iterative diffusion-based reconstruction for self-reflection and correction. RemeDi is trained via a unified framework integrating masked diffusion modeling, confidence-aware supervised fine-tuning, and reinforcement learning optimization. On multiple text generation benchmarks, RemeDi significantly outperforms existing open-source DLMs, achieving state-of-the-art performance in accuracy, fluency, and error correction. Notably, it is the first diffusion-based model to support controllable, iterative text refinement—marking a foundational advance toward self-correcting generative language modeling.

Technology Category

Application Category

📝 Abstract
Mask-based Diffusion Language Models (DLMs) struggle to revise incorrect tokens: once a token is generated, it typically remains fixed. The key challenge is to identify potential errors in the inputs. In this paper, we propose emph{underline{Rem}asking-underline{e}nabled underline{Di}ffusion Language Model (RemeDi}, a mask-based DLM that introduces emph{remasking} as another fundamental mechanism, enabling more flexible text refinement in diffusion-based text generation. To achieve this, RemeDi jointly predicts token distributions and per-token confidence scores at each step. The confidence scores determine which tokens to be unmasked after the current step, allowing the model to identify tokens with low quality and remask them. These remasked tokens can be resampled with richer context in subsequent steps. We design a remask-aware pipeline to train this ability, including supervised fine-tuning which teaches the model to detect and remask incorrect tokens in addition to predict mask tokens, and reinforcement learning which optimizes full generation trajectories toward higher rewards. Experiments show that RemeDi achieves the state-of-the-art results among open-source DLMs on multiple datasets.
Problem

Research questions and friction points this paper is trying to address.

Addresses token revision limitations in diffusion language models
Proposes remasking mechanism for flexible text refinement
Enables identification and resampling of low-quality tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces remasking mechanism for flexible text refinement
Jointly predicts token distributions and confidence scores
Uses supervised fine-tuning and reinforcement learning training
🔎 Similar Papers
No similar papers found.
Zemin Huang
Zemin Huang
PhD student, Westlake University, Zhejiang University
Diffusion ModelAutoregressive ModelDiffusion Distillation
Y
Yuhang Wang
MAPLE Lab, Westlake University
Z
Zhiyang Chen
MAPLE Lab, Westlake University
G
Guo-Jun Qi
MAPLE Lab, Westlake University