🤖 AI Summary
Large language models (LLMs) suffer from poor computational efficiency in complex reasoning due to their autoregressive generation paradigm—yielding diminishing accuracy gains against sharply increasing inference-time compute costs. To address this, we propose DiffuReason: an efficient reasoning framework that synergistically integrates diffusion language models (DLMs) with LLMs. Its core innovation is the first application of DLMs to reasoning, leveraging their parallel denoising capability to generate diverse intermediate reasoning traces in batches; an LLM then evaluates and selects high-quality candidates. This breaks the inherent scalability bottleneck of autoregressive models in reasoning. Evaluated on multiple challenging reasoning benchmarks, DiffuReason reduces computational cost significantly (e.g., 30–50% fewer FLOPs) while maintaining or exceeding the accuracy of standalone LLMs—achieving a superior trade-off between computational efficiency and reasoning performance.
📝 Abstract
In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of intermediate thoughts, LLMs demonstrate the potential to generate deliberate reasoning steps, thereby substantially enhancing reasoning accuracy. However, LLMs' autoregressive generation paradigm results in reasoning performance scaling sub-optimally with test-time computation, often requiring excessive computational overhead to propose thoughts while yielding only marginal performance gains. In contrast, diffusion language models (DLMs) can efficiently produce diverse samples through parallel denoising in a single forward pass, inspiring us to leverage them for proposing intermediate thoughts, thereby alleviating the computational burden associated with autoregressive generation while maintaining quality. In this work, we propose an efficient collaborative reasoning framework, leveraging DLMs to generate candidate thoughts and LLMs to evaluate their quality. Experiments across diverse benchmarks demonstrate that our framework achieves strong performance in complex reasoning tasks, offering a promising direction for future research. Our code is open-source at https://anonymous.4open.science/r/Diffuse-Thinking-EC60.