CORE: Context-Robust Remasking for Diffusion Language Models

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Masked diffusion language models are prone to early, high-confidence yet erroneous predictions during decoding due to contextual rigidity, leading to cascading errors. This work proposes a training-free, inference-time correction framework that identifies and prioritizes the revision of contextually fragile tokens by analyzing their sensitivity to contextual perturbations. The approach formalizes this process as a robust optimization objective, thereby overcoming the limitations of conventional methods that rely solely on static confidence scores. Experimental results on LLaDA-8B-Base demonstrate significant improvements in both reasoning and code generation, with performance gains of up to 9.2 percentage points on the MBPP benchmark, outperforming baseline methods with comparable computational overhead.

Technology Category

Application Category

📝 Abstract
Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full context. This creates cascade effects where initial inconsistencies misguide the remaining generation. Existing revision strategies attempt to mitigate this by relying on static confidence scores, but these signals are inherently myopic; inconsistent tokens can appear confident to the model itself. We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision. Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations. We formalize revision as a robust optimization objective over context shifts and efficiently approximate this objective to prioritize unstable tokens for revision. On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
Problem

Research questions and friction points this paper is trying to address.

context rigidity
masked diffusion models
cascade errors
token confidence
inconsistent generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-Robust Remasking
Masked Diffusion Models
context perturbation
robust optimization
inference-time revision
🔎 Similar Papers
No similar papers found.