CORE: Context-Robust Remasking for Diffusion Language Models

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Masked diffusion language models are prone to early, high-confidence yet erroneous predictions during decoding due to contextual rigidity, leading to cascading errors. This work proposes a training-free, inference-time correction framework that identifies and prioritizes the revision of contextually fragile tokens by analyzing their sensitivity to contextual perturbations. The approach formalizes this process as a robust optimization objective, thereby overcoming the limitations of conventional methods that rely solely on static confidence scores. Experimental results on LLaDA-8B-Base demonstrate significant improvements in both reasoning and code generation, with performance gains of up to 9.2 percentage points on the MBPP benchmark, outperforming baseline methods with comparable computational overhead.

Technology Category

Application Category

📝 Abstract

Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full context. This creates cascade effects where initial inconsistencies misguide the remaining generation. Existing revision strategies attempt to mitigate this by relying on static confidence scores, but these signals are inherently myopic; inconsistent tokens can appear confident to the model itself. We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision. Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations. We formalize revision as a robust optimization objective over context shifts and efficiently approximate this objective to prioritize unstable tokens for revision. On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.

Problem

Research questions and friction points this paper is trying to address.

context rigidity

masked diffusion models

cascade errors

token confidence

inconsistent generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-Robust Remasking

Masked Diffusion Models

context perturbation