BackPlay: Plug-in Look-Back Self-Correction for Diffusion Language Models

📅 2026-01-10

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the significant quality degradation in multi-token parallel generation within diffusion language models, which arises from error propagation as generation step size increases. To mitigate this issue, the authors propose the Decoupled Self-Correction (DSC) framework, which first enhances the base model’s generative capability and then freezes its backbone while training a dedicated correction head. This correction head leverages a Future Context Augmentation (FCA) mechanism to access richer future contextual information, enabling it to identify and rectify subtle errors. Employing a two-stage training strategy, DSC substantially alleviates quality deterioration at large step sizes on mathematical reasoning and code generation tasks, achieving high-fidelity outputs without compromising the efficiency of parallel generation.

Technology Category

Application Category

📝 Abstract

Diffusion Language Models (DLMs) have achieved significant efficiency gains by generating multiple tokens in parallel. However, this parallel sampling approach, especially when using fewer inference steps, will introduce strong dependency errors and cause quality to deteriorate rapidly as the generation step size grows. As a result, reliable self-correction becomes essential for maintaining high-quality multi-token generation. To address this, we propose BackPlay, a Plug-in framework that enables DLMs to perform autonomous self-correction. BackPlay freezes the parameters of a finetuned DLM to preserve its peak performance while training a specialized correction head added on top of the model. This head is trained specifically on the errors generated by the frozen and well-optimized model, enabling it to capture the model's intrinsic error distribution. To further enhance the head's effectiveness, we introduce Look-back Correction, a training mechanism that empowers the head to leverage current contextual information to supervise and rectify mistakes made in earlier generation steps. During inference, our framework enables the model to jointly generate and revise tokens, effectively mitigating error accumulation. Experiments on mathematical reasoning and code generation benchmarks demonstrate that our approach substantially reduces quality degradation in large-step generation, allowing DLMs to achieve both high speed and strong output fidelity.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models

Parallel Generation

Dependency Errors

Self-Correction

Generation Quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Self-Correction

Future-Context Augmentation

Masked Diffusion Language Models