🤖 AI Summary
This work addresses the challenge of enhancing decoding parallelism in Masked Diffusion Language Models (MDMs) without compromising mathematical reasoning performance. We propose a trajectory-aware reinforcement learning (TraceRL)-based progressive block-size scaling training strategy that begins with a small-block autoregressive initialization and smoothly transitions to larger block configurations. This approach achieves substantial gains in parallel decoding efficiency while preserving near-identical performance on mathematical reasoning benchmarks. Notably, our method introduces trajectory-aware reinforcement learning into MDM training for the first time, enabling not only efficient scheduling but also the discovery of alternative decoding strategies that converge to comparable performance levels, thereby demonstrating both effectiveness and generalization potential.
📝 Abstract
We present T*, a simple TraceRL-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small-block MDM, T* transitions smoothly to larger blocks, enabling higher-parallelism decoding with minimal performance degradation on math reasoning benchmarks. Moreover, further analysis suggests that T* can converge to an alternative decoding schedule that achieves comparable performance.