Wait, We Don't Need to"Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models often suffer from redundant inference triggered by explicit self-reflective tokens (e.g., “Wait”, “Hmm”), leading to unnecessarily long reasoning paths and reduced efficiency. This work is the first to systematically demonstrate that such tokens are unnecessary for high-level reasoning. We propose a zero-shot, plug-and-play lightweight intervention: inference-time token suppression—specifically, disabling explicit thinking tokens during generation. Leveraging multimodal chain-of-thought trajectory analysis and a unified cross-modal (text/vision/video) evaluation framework, our method achieves 27%–51% reduction in reasoning path length across 10 benchmarks and five R1-series models, with no degradation in downstream task performance. Our core contribution is establishing the “implicit reasoning优于 explicit self-reflection” paradigm, offering a scalable, efficient path-pruning solution for large language models that prioritizes concision and computational efficiency without sacrificing capability.

Technology Category

Application Category

📝 Abstract
Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as"Wait"and"Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.
Problem

Research questions and friction points this paper is trying to address.

Reduces redundant outputs in reasoning models
Eliminates need for self-reflection tokens like 'Wait'
Improves efficiency without compromising model utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disables explicit self-reflection tokens
Reduces reasoning trajectory length
Plug-and-play efficient multimodal reasoning
🔎 Similar Papers