🤖 AI Summary
This work addresses the challenge small reasoning models (SRMs) face in balancing efficiency and accuracy when collaborating with large reasoning models (LRMs), primarily due to their difficulty in reliably detecting their own reasoning failures. The study introduces tensor rank as a dynamic signal within the hidden state space to uncover three characteristic failure modes of SRMs. Building on this insight, the authors propose a tensor-rank-based dynamic routing mechanism coupled with a guidance vector extraction method, enabling real-time modulation of reasoning trajectories. This framework substantially enhances SRM–LRM collaboration, achieving up to a 1.75× reduction in latency across multiple reasoning benchmarks while maintaining or even surpassing the accuracy of existing approaches.
📝 Abstract
Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectively and efficiently detecting and mitigating SRM failures in collaborative systems remains a key challenge. To address this issue, we analyze SRM inference in both the generated text and hidden-state spaces, and identify three types of failure modes: \textit{overconfidence}, \textit{uncertainty}, and \textit{heavy revalidation}. Building on these insights, we propose \textbf{RankGuide}, a framework that improves the efficiency and effectiveness of SRM--LRM collaboration through tensor-rank-guided routing and steering. Specifically, RankGuide leverages a routing signal that incorporates tensor-rank signals derived from consecutive hidden states to detect when SRMs are likely to fail and selectively invoke LRMs. In addition, we introduce a tensor-rank-filtered steering vector extraction method to modulate the reasoning trajectory of SRMs, thereby improving their generation quality. By improving both routing and steering through tensor-rank signals, RankGuide enables SRM--LRM collaborative systems to achieve more efficient reasoning with fewer steps and improved accuracy. Experiments on multiple reasoning benchmarks demonstrate the efficacy of RankGuide in reducing latency by up to $1.75\times$ compared to LRM, while maintaining competitive accuracy relative to prior methods.