T$^\star$: Progressive Block Scaling for MDM Through Trajectory Aware RL

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge of enhancing decoding parallelism in Masked Diffusion Language Models (MDMs) without compromising mathematical reasoning performance. We propose a trajectory-aware reinforcement learning (TraceRL)-based progressive block-size scaling training strategy that begins with a small-block autoregressive initialization and smoothly transitions to larger block configurations. This approach achieves substantial gains in parallel decoding efficiency while preserving near-identical performance on mathematical reasoning benchmarks. Notably, our method introduces trajectory-aware reinforcement learning into MDM training for the first time, enabling not only efficient scheduling but also the discovery of alternative decoding strategies that converge to comparable performance levels, thereby demonstrating both effectiveness and generalization potential.

Technology Category

Application Category

📝 Abstract

We present T*, a simple TraceRL-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small-block MDM, T* transitions smoothly to larger blocks, enabling higher-parallelism decoding with minimal performance degradation on math reasoning benchmarks. Moreover, further analysis suggests that T* can converge to an alternative decoding schedule that achieves comparable performance.

Problem

Research questions and friction points this paper is trying to address.

masked diffusion language models

block scaling

parallel decoding

math reasoning

performance degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

progressive block scaling

masked diffusion models

trajectory-aware RL