Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current hybrid reasoning large language models suffer from “reasoning leakage”: they unnecessarily generate lengthy reasoning traces even in direct-answer (no-think) mode, undermining inference efficiency and mode controllability. This work is the first to systematically identify four key factors affecting mode separability and proposes a novel training paradigm grounded in cross-problem sample construction, balanced data mixing, and two-stage fine-tuning. Crucially, the method preserves high accuracy in both think and no-think modes while substantially improving mode isolation. On MATH500, it reduces no-think output length from 1,085 to 585 tokens and slashes reasoning-support token frequency from 5,917 to 522 occurrences. The study establishes a reproducible methodology and empirical benchmark for controllable reasoning-mode modeling, advancing principled design of dual-mode LLMs.

Technology Category

Application Category

📝 Abstract

Hybrid thinking enables LLMs to switch between reasoning and direct answering, offering a balance between efficiency and reasoning capability. Yet our experiments reveal that current hybrid thinking LLMs only achieve partial mode separation: reasoning behaviors often leak into the no-think mode. To understand and mitigate this, we analyze the factors influencing controllability and identify four that matter most: (1) larger data scale, (2) using think and no-think answers from different questions rather than the same question, (3) a moderate increase in no-think data number, and (4) a two-phase strategy that first trains reasoning ability and then applies hybrid think training. Building on these findings, we propose a practical recipe that, compared to standard training, can maintain accuracy in both modes while significantly reducing no-think output length (from $1085$ to $585$ on MATH500) and occurrences of reasoning-supportive tokens such as `` exttt{wait}''(from $5917$ to $522$ on MATH500). Our findings highlight the limitations of current hybrid thinking and offer directions for strengthening its controllability.

Problem

Research questions and friction points this paper is trying to address.

Investigating mode separation in hybrid thinking LLMs

Analyzing factors influencing think/no-think controllability

Developing training methods to reduce reasoning leakage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase training strategy for hybrid thinking

Reduces reasoning leakage in no-think mode

Optimizes data scale and question separation

🔎 Similar Papers

No similar papers found.

Authors to Follow