Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the limitations of fixed-size inference chunks in diffusion-based large language models, which hinder reasoning coherence and adaptability in multi-task settings. The authors propose the b1 framework, introducing a dynamic inference chunking mechanism that leverages reinforcement learning to optimize chunk sizes during inference. Guided by the principle of monotonically decreasing inter-chunk entropy, this approach enables task-adaptive, logically coherent semi-autoregressive reasoning. Designed as a plug-and-play module, b1 integrates seamlessly with existing training pipelines without requiring architectural modifications. Extensive evaluations across multiple reasoning benchmarks demonstrate that the proposed method significantly outperforms fixed-chunk baselines, confirming that dynamic chunking effectively enhances both reasoning performance and consistency.

📝 Abstract

Recent diffusion large language models (dLLMs) have demonstrated both effectiveness and efficiency in reasoning via a block-based semi-autoregressive generation paradigm. Despite their progress, the fixed-size block generations remain a critical bottleneck for effective and coherent reasoning. 1. From a global perspective, different reasoning tasks would correspond to different optimal decoding block sizes, which makes a ``one-size-fits-all'' assumption ineffective. 2. Even within a single reasoning task, the rigid block partitioning would break the logical flow and reduce reasoning coherence. Through empirical observations, we reveal that for block-wise entropy, incorrect reasoning exhibits a fluctuating and unsteady trend between blocks, whereas the correctly generated tasks follow a consistent descending trend. Therefore, this paper proposes b1, a novel post-training framework for dLLMs that learns dynamic-size reasoning blocks via a Monotonic Entropy Descent objective with reinforcement learning to enhance reasoning coherence.b1 integrates seamlessly as a plug-and-play module with existing dLLM's post-training algorithms. Extensive experiments across various reasoning benchmarks showcase b1's consistent improvement over existing fixed-size block baselines. Our code has been released at https://github.com/YanJiangJerry/Block-R1.

Problem

Research questions and friction points this paper is trying to address.

diffusion large language models

reasoning coherence

block-based generation

dynamic block size

entropy descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic-size reasoning blocks

monotonic entropy descent

diffusion large language models