Think Only When You Need with Large Hybrid-Reasoning Models

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large reasoning models (LRMs) incur unnecessary token consumption and latency overhead by enforcing deep, multi-step reasoning even for simple queries. Method: We propose Large Hybrid Reasoning Models (LHRMs), the first framework to adaptively select between “reasoning” and “direct answer” modes based on query difficulty. We design a context-aware reasoning gating mechanism and a two-stage training framework—Hybrid Fine-Tuning (HFT) followed by Hybrid Group Policy Optimization (HGPO)—to implicitly learn the optimal inference-mode decision. Contribution/Results: We introduce Hybrid Accuracy, a novel evaluation metric that jointly accounts for correctness and mode appropriateness. Experiments across diverse reasoning benchmarks demonstrate that LHRMs significantly outperform state-of-the-art LRMs and LLMs: they reduce token usage and end-to-end latency substantially while maintaining or improving overall accuracy—achieving an unprecedented balance between efficiency and performance.

Technology Category

Application Category

📝 Abstract

Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, which is particularly unnecessary for simple queries. In this work, we introduce Large Hybrid-Reasoning Models (LHRMs), the first kind of model capable of adaptively determining whether to perform thinking based on the contextual information of user queries. To achieve this, we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) as a cold start, followed by online reinforcement learning with the proposed Hybrid Group Policy Optimization (HGPO) to implicitly learn to select the appropriate thinking mode. Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model's capability for hybrid thinking. Extensive experimental results show that LHRMs can adaptively perform hybrid thinking on queries of varying difficulty and type. It outperforms existing LRMs and LLMs in reasoning and general capabilities while significantly improving efficiency. Together, our work advocates for a reconsideration of the appropriate use of extended thinking processes and provides a solid starting point for building hybrid thinking systems.

Problem

Research questions and friction points this paper is trying to address.

Adaptively determine thinking need for queries

Reduce token overhead and latency in reasoning

Improve hybrid thinking accuracy and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive thinking based on query context

Two-stage training with HFT and HGPO

Hybrid Accuracy metric for hybrid thinking

🔎 Similar Papers

Rational Metareasoning for Large Language Models