Wait, We Don't Need to"Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

Large reasoning models often suffer from redundant inference triggered by explicit self-reflective tokens (e.g., “Wait”, “Hmm”), leading to unnecessarily long reasoning paths and reduced efficiency. This work is the first to systematically demonstrate that such tokens are unnecessary for high-level reasoning. We propose a zero-shot, plug-and-play lightweight intervention: inference-time token suppression—specifically, disabling explicit thinking tokens during generation. Leveraging multimodal chain-of-thought trajectory analysis and a unified cross-modal (text/vision/video) evaluation framework, our method achieves 27%–51% reduction in reasoning path length across 10 benchmarks and five R1-series models, with no degradation in downstream task performance. Our core contribution is establishing the “implicit reasoning优于 explicit self-reflection” paradigm, offering a scalable, efficient path-pruning solution for large language models that prioritizes concision and computational efficiency without sacrificing capability.

Technology Category

Application Category

📝 Abstract

Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as"Wait"and"Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.

Problem

Research questions and friction points this paper is trying to address.

Reduces redundant outputs in reasoning models

Eliminates need for self-reflection tokens like 'Wait'

Improves efficiency without compromising model utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disables explicit self-reflection tokens

Reduces reasoning trajectory length

Plug-and-play efficient multimodal reasoning

🔎 Similar Papers

Rational Metareasoning for Large Language Models