Corrective Diffusion Language Models

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion language models (DLMs) hold promise for iterative error correction, yet standard masked DLMs (MDLMs) fail to reliably identify erroneous tokens, rendering confidence-based refinement ineffective. To address this, we propose a correction-oriented post-training paradigm—the first to enable error-aware confidence modeling and controllable in-place correction. Our method explicitly enhances the model’s ability to detect and replace unreliable tokens via error-label supervision, confidence calibration, and iterative refinement. We further introduce CRB, the first executable code revision benchmark, designed to quantitatively evaluate correction capability. Experiments demonstrate that our approach significantly outperforms MDLM baselines on code revision tasks while concurrently improving general text generation quality. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Diffusion language models are structurally well-suited for iterative error correction, as their non-causal denoising dynamics allow arbitrary positions in a sequence to be revised. However, standard masked diffusion language model (MDLM) training fails to reliably induce this behavior, as models often cannot identify unreliable tokens in a complete input, rendering confidence-guided refinement ineffective. We study corrective behavior in diffusion language models, defined as the ability to assign lower confidence to incorrect tokens and iteratively refine them while preserving correct content. We show that this capability is not induced by conventional masked diffusion objectives and propose a correction-oriented post-training principle that explicitly supervises visible incorrect tokens, enabling error-aware confidence and targeted refinement. To evaluate corrective behavior, we introduce the Code Revision Benchmark (CRB), a controllable and executable benchmark for assessing error localization and in-place correction. Experiments on code revision tasks and controlled settings demonstrate that models trained with our approach substantially outperform standard MDLMs in correction scenarios, while also improving pure completion performance. Our code is publicly available at https://github.com/zhangshuibai/CDLM.
Problem

Research questions and friction points this paper is trying to address.

Standard diffusion language models fail to identify unreliable tokens for correction
Conventional training objectives cannot induce error-aware confidence and targeted refinement
Lack of controllable benchmarks to assess error localization and in-place correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces correction-oriented post-training for error-aware confidence
Proposes Code Revision Benchmark for evaluating error localization
Enables targeted refinement by supervising visible incorrect tokens