🤖 AI Summary
Existing masked diffusion language models (DLMs) struggle to correct erroneous tokens post-generation due to their inability to dynamically identify and regenerate low-confidence segments. To address this, we propose RemeDi, a re-masking–enabled diffusion language model that introduces a token-level confidence prediction module to autonomously detect and re-mask unreliable tokens during generation, enabling iterative diffusion-based reconstruction for self-reflection and correction. RemeDi is trained via a unified framework integrating masked diffusion modeling, confidence-aware supervised fine-tuning, and reinforcement learning optimization. On multiple text generation benchmarks, RemeDi significantly outperforms existing open-source DLMs, achieving state-of-the-art performance in accuracy, fluency, and error correction. Notably, it is the first diffusion-based model to support controllable, iterative text refinement—marking a foundational advance toward self-correcting generative language modeling.
📝 Abstract
Mask-based Diffusion Language Models (DLMs) struggle to revise incorrect tokens: once a token is generated, it typically remains fixed. The key challenge is to identify potential errors in the inputs. In this paper, we propose emph{underline{Rem}asking-underline{e}nabled underline{Di}ffusion Language Model (RemeDi}, a mask-based DLM that introduces emph{remasking} as another fundamental mechanism, enabling more flexible text refinement in diffusion-based text generation. To achieve this, RemeDi jointly predicts token distributions and per-token confidence scores at each step. The confidence scores determine which tokens to be unmasked after the current step, allowing the model to identify tokens with low quality and remask them. These remasked tokens can be resampled with richer context in subsequent steps. We design a remask-aware pipeline to train this ability, including supervised fine-tuning which teaches the model to detect and remask incorrect tokens in addition to predict mask tokens, and reinforcement learning which optimizes full generation trajectories toward higher rewards. Experiments show that RemeDi achieves the state-of-the-art results among open-source DLMs on multiple datasets.