Advancing Block Diffusion Language Models for Test-Time Scaling

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge of balancing efficiency and accuracy in existing block-wise diffusion language models during test-time scaling and long-chain reasoning. The authors propose a unified framework that integrates a difficulty-aware Bounded Adaptive Confidence Decoding (BACD) strategy with a “Think Coarse, Critic Fine” (TCCF) chunking paradigm, which dynamically adjusts denoising intensity and chunk size. This approach is further enhanced by a progressive chunk expansion mechanism. Evaluated on the TDAR-8B model, the method achieves a 2.26× speedup over TraDo-8B while improving the AIME24 score by 11.2 points, substantially outperforming strong baselines.

Technology Category

Application Category

📝 Abstract

Recent advances in block diffusion language models have demonstrated competitive performance and strong scalability on reasoning tasks. However, existing BDLMs have limited exploration under the test-time scaling setting and face more severe decoding challenges in long Chain-of-Thought reasoning, particularly in balancing the decoding speed and effectiveness. In this work, we propose a unified framework for test-time scaling in BDLMs that introduces adaptivity in both decoding and block-wise generation. At the decoding level, we propose Bounded Adaptive Confidence Decoding (BACD), a difficulty-aware sampling strategy that dynamically adjusts denoising based on model confidence, accelerating inference while controlling error accumulation. Beyond step-wise adaptivity, we introduce Think Coarse, Critic Fine (TCCF), a test-time scaling paradigm that allocates large block sizes to exploratory reasoning and smaller block sizes to refinement, achieving an effective efficiency-effectiveness balance. To enable efficient and effective decoding with a large block size, we adopt Progressive Block Size Extension, which mitigates performance degradation when scaling block sizes. Extensive experiments show that applying BACD and TCCF to TDAR-8B yields significant improvements over strong baselines such as TraDo-8B (2.26x speedup, +11.2 points on AIME24). These results mark an important step toward unlocking the potential of BDLMs for test-time scaling in complex reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

block diffusion language models

test-time scaling

Chain-of-Thought reasoning

decoding efficiency

reasoning effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Block Diffusion Language Models

Test-Time Scaling

Adaptive Decoding