Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This work addresses a critical yet previously underexplored issue in non-autoregressive diffusion language models: susceptibility to proximity bias during inference, which causes generated sequences to over-rely on initial demasking positions and propagate spatial errors. The study systematically uncovers the underlying mechanism of this bias and introduces a minimalist intervention strategy that combines a lightweight planner with end-of-sequence temperature annealing to guide high-confidence token selection in early denoising steps. Requiring only marginal computational overhead, the proposed approach consistently outperforms existing heuristic baselines across diverse reasoning and planning tasks, thereby highlighting the pivotal role of early-stage decisions in determining overall generation quality.

Technology Category

Application Category

📝 Abstract
Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential for parallel token generation and bidirectional context modeling. However, harnessing this flexibility for fully non-autoregressive decoding remains an open question, particularly for reasoning and planning tasks. In this work, we investigate non-autoregressive decoding in dLLMs by systematically analyzing its inference dynamics along the temporal axis. Specifically, we uncover an inherent failure mode in confidence-based non-autoregressive generation stemming from a strong proximity bias-the tendency for the denoising order to concentrate on spatially adjacent tokens. This local dependency leads to spatial error propagation, rendering the entire trajectory critically contingent on the initial unmasking position. Leveraging this insight, we present a minimal-intervention approach that guides early token selection, employing a lightweight planner and end-of-sequence temperature annealing. We thoroughly evaluate our method on various reasoning and planning tasks and observe substantial overall improvement over existing heuristic baselines without significant computational overhead.
Problem

Research questions and friction points this paper is trying to address.

proximity bias
non-autoregressive decoding
diffusion language models
error propagation
initial trajectory
Innovation

Methods, ideas, or system contributions that make the work stand out.

proximity bias
non-autoregressive decoding
diffusion language models
initial trajectory shaping
temperature annealing
🔎 Similar Papers
2024-08-21BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLPCitations: 1