Advancing Block Diffusion Language Models for Test-Time Scaling

πŸ“… 2026-02-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of balancing efficiency and accuracy in existing block-wise diffusion language models during test-time scaling and long-chain reasoning. The authors propose a unified framework that integrates a difficulty-aware Bounded Adaptive Confidence Decoding (BACD) strategy with a β€œThink Coarse, Critic Fine” (TCCF) chunking paradigm, which dynamically adjusts denoising intensity and chunk size. This approach is further enhanced by a progressive chunk expansion mechanism. Evaluated on the TDAR-8B model, the method achieves a 2.26Γ— speedup over TraDo-8B while improving the AIME24 score by 11.2 points, substantially outperforming strong baselines.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in block diffusion language models have demonstrated competitive performance and strong scalability on reasoning tasks. However, existing BDLMs have limited exploration under the test-time scaling setting and face more severe decoding challenges in long Chain-of-Thought reasoning, particularly in balancing the decoding speed and effectiveness. In this work, we propose a unified framework for test-time scaling in BDLMs that introduces adaptivity in both decoding and block-wise generation. At the decoding level, we propose Bounded Adaptive Confidence Decoding (BACD), a difficulty-aware sampling strategy that dynamically adjusts denoising based on model confidence, accelerating inference while controlling error accumulation. Beyond step-wise adaptivity, we introduce Think Coarse, Critic Fine (TCCF), a test-time scaling paradigm that allocates large block sizes to exploratory reasoning and smaller block sizes to refinement, achieving an effective efficiency-effectiveness balance. To enable efficient and effective decoding with a large block size, we adopt Progressive Block Size Extension, which mitigates performance degradation when scaling block sizes. Extensive experiments show that applying BACD and TCCF to TDAR-8B yields significant improvements over strong baselines such as TraDo-8B (2.26x speedup, +11.2 points on AIME24). These results mark an important step toward unlocking the potential of BDLMs for test-time scaling in complex reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

block diffusion language models
test-time scaling
Chain-of-Thought reasoning
decoding efficiency
reasoning effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Block Diffusion Language Models
Test-Time Scaling
Adaptive Decoding
Chain-of-Thought Reasoning
Block-wise Generation
πŸ”Ž Similar Papers
No similar papers found.