Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current hybrid reasoning large language models suffer from “reasoning leakage”: they unnecessarily generate lengthy reasoning traces even in direct-answer (no-think) mode, undermining inference efficiency and mode controllability. This work is the first to systematically identify four key factors affecting mode separability and proposes a novel training paradigm grounded in cross-problem sample construction, balanced data mixing, and two-stage fine-tuning. Crucially, the method preserves high accuracy in both think and no-think modes while substantially improving mode isolation. On MATH500, it reduces no-think output length from 1,085 to 585 tokens and slashes reasoning-support token frequency from 5,917 to 522 occurrences. The study establishes a reproducible methodology and empirical benchmark for controllable reasoning-mode modeling, advancing principled design of dual-mode LLMs.

Technology Category

Application Category

📝 Abstract
Hybrid thinking enables LLMs to switch between reasoning and direct answering, offering a balance between efficiency and reasoning capability. Yet our experiments reveal that current hybrid thinking LLMs only achieve partial mode separation: reasoning behaviors often leak into the no-think mode. To understand and mitigate this, we analyze the factors influencing controllability and identify four that matter most: (1) larger data scale, (2) using think and no-think answers from different questions rather than the same question, (3) a moderate increase in no-think data number, and (4) a two-phase strategy that first trains reasoning ability and then applies hybrid think training. Building on these findings, we propose a practical recipe that, compared to standard training, can maintain accuracy in both modes while significantly reducing no-think output length (from $1085$ to $585$ on MATH500) and occurrences of reasoning-supportive tokens such as `` exttt{wait}''(from $5917$ to $522$ on MATH500). Our findings highlight the limitations of current hybrid thinking and offer directions for strengthening its controllability.
Problem

Research questions and friction points this paper is trying to address.

Investigating mode separation in hybrid thinking LLMs
Analyzing factors influencing think/no-think controllability
Developing training methods to reduce reasoning leakage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase training strategy for hybrid thinking
Reduces reasoning leakage in no-think mode
Optimizes data scale and question separation
🔎 Similar Papers
No similar papers found.
S
Shouren Wang
Case Western Reserve University
W
Wang Yang
Case Western Reserve University
X
Xianxuan Long
Case Western Reserve University
Q
Qifan Wang
Meta AI
Vipin Chaudhary
Vipin Chaudhary
Case Western Reserve University
High Performance ComputingArtificial IntelligenceData ScienceComputer VisionQuantum Computing
Xiaotian Han
Xiaotian Han
Research Scientist, OpenAI
Machine learningComputer VisionMultimodalGenAILLM