🤖 AI Summary
This work addresses the inefficiency and error propagation in existing speculative inference methods for multimodal large language models, which often arise from misalignment between draft and target reasoning trajectories. To overcome these limitations, the authors propose the DREAM-R framework, which introduces Speculative Alignment Policy Optimization (SAPO) and a Threshold-Based Verification Mechanism (TBVM), integrated within a Fully Parallel Speculative Reasoning (FPSR) architecture to enable efficient parallel generation and verification of multi-step drafts. By combining reinforcement learning–based draft model training, ratio-based threshold verification, and an early-stopping fallback strategy, DREAM-R achieves significant acceleration across multiple complex multimodal reasoning benchmarks while preserving the accuracy of the target model, effectively balancing inference speed and correctness.
📝 Abstract
Speculative reasoning has recently been proposed as a means to accelerate reasoning-intensive generation in large multimodal models, but its effectiveness is often constrained by misalignment between speculative drafts and target-verified reasoning. In this work, we introduce DREAM-R, a framework that substantially improves the performance of speculative reasoning. At its core, DREAM-R employs Speculative Alignment Policy Optimization (SAPO), a reinforcement-learning objective that trains draft models to generate reasoning steps that are both faithful to target trajectories and concise. We further propose a Threshold-based Verification Mechanism (TBVM) that uses a ratio-based criterion to provide stable and interpretable acceptance of speculative steps only when positive evidence clearly dominates, thereby preventing error propagation. Building on these components, we develop a Fully Parallel Speculative Reasoning (FPSR) framework that parallelizes draft generation, target-side reasoning, and verification across multi-step reasoning, enabling early stopping and clean fallback. Experiments on reasoning-heavy benchmarks demonstrate up to speedup while preserving target-model accuracy, yielding substantial efficiency gains without compromising reasoning quality.