🤖 AI Summary
This work addresses the inefficiency of parallel decoding in autoregressive large language models and the limited reasoning capabilities of diffusion-based large language models (dLLMs). To this end, the authors propose dVoting, a novel method that integrates token-level voting with the non-autoregressive, arbitrary-position generation capacity of dLLMs. Without requiring additional training, dVoting performs iterative refinement through multiple rounds of parallel sampling, identifies uncertain tokens via consistency analysis, and regenerates them via voting until convergence. Experimental results demonstrate substantial performance gains across multiple benchmarks: GSM8K (+6.22%–7.66%), MATH500 (+4.40%–7.20%), ARC-C (+3.16%–14.84%), and MMLU (+4.83%–5.74%), all while maintaining manageable computational overhead.
📝 Abstract
Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting