Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Masked diffusion models (MDMs) suffer from an inability to revise already generated tokens after unmasking, leading to error accumulation and degraded sample quality. To address this limitation, this work proposes the Progressive Self-Correction (ProSeCo) framework, which jointly learns unmasking and correction capabilities during training and alternates between unmasking and refinement steps during generation, enabling iterative sequence-level optimization. ProSeCo introduces, for the first time, a dynamic correction mechanism for previously generated tokens, overcoming the unidirectional generation constraint of MDMs by repurposing the denoising network as a corrector and integrating an iterative sampling strategy to enhance output fidelity. Experiments demonstrate that ProSeCo achieves up to 2–3× faster sampling across multiple generative tasks and improves sample quality by up to 1.3× through test-time computation scaling.

Technology Category

Application Category

📝 Abstract

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising network as inputs for corrector training, we train a model to recover from potential mistakes. During generation we apply additional corrective refinement steps between unmasking ones in order to change decoded tokens and improve outputs. We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence, including already generated tokens. We conduct extensive experimental validation across multiple conditional and unconditional tasks, demonstrating that ProSeCo yields better quality-efficiency trade-offs (up to ~2-3x faster sampling) and enables inference-time compute scaling to further increase sample quality beyond standard MDMs (up to ~1.3x improvement on benchmarks).

Problem

Research questions and friction points this paper is trying to address.

masked diffusion models

error accumulation

token generation

sample quality

self-correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

masked diffusion models

self-correction

progressive refinement