🤖 AI Summary
Masked diffusion models (MDMs) suffer from decoding strategy sensitivity in sequence generation, where existing uncertainty-based samplers exhibit two critical flaws: global path divergence and premature preference for trivial tokens. To address these, we propose Position-Aware Confidence-calibrated Sampling (PACS), the first method to introduce a position-weighted mechanism that jointly optimizes global path planning and content-aware information maximization. PACS dynamically recalibrates stepwise confidence scores, effectively suppressing early bias and enabling fine-grained trajectory control. Evaluated on seven challenging benchmarks, PACS consistently outperforms prior MDM decoding strategies by over 10% on average, substantially narrowing the performance gap with autoregressive models. Our approach establishes a more robust and controllable paradigm for non-autoregressive sequence generation.
📝 Abstract
Recent advances in masked diffusion models (MDMs) have established them as powerful non-autoregressive alternatives for sequence generation. Nevertheless, our preliminary experiments reveal that the generation quality of MDMs is still highly sensitive to the choice of decoding strategy. In particular, widely adopted uncertainty-based samplers suffer from two key limitations: a lack of global trajectory control and a pronounced bias toward trivial tokens in the early stages of decoding. These shortcomings restrict the full potential of MDMs. In this work, we introduce Position-Aware Confidence-Calibrated Sampling (PC-Sampler), a novel decoding strategy that unifies global trajectory planning with content-aware informativeness maximization. PC-Sampler incorporates a position-aware weighting mechanism to regulate the decoding path and a calibrated confidence score to suppress the premature selection of trivial tokens. Extensive experiments on three advanced MDMs across seven challenging benchmarks-including logical reasoning and planning tasks-demonstrate that PC-Sampler consistently outperforms existing MDM decoding strategies by more than 10% on average, significantly narrowing the performance gap with state-of-the-art autoregressive models. All codes are available at https://github.com/NEUIR/PC-Sampler.