Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low inference efficiency of large language models (LLMs) caused by the “computation-for-performance” paradigm, this paper identifies that inference uncertainty is highly localized at high-entropy tokens. Based on this insight, we propose the Minimal Test-time Intervention (MTI) framework. MTI selectively activates classifier-free guidance (CFG) and lightweight negative prompting—leveraging KV-cache reuse—only at high-uncertainty token positions, requiring no fine-tuning or additional training and incurring negligible computational overhead. This approach reveals the inherently local nature of reasoning errors, eliminating globally redundant interventions and thereby significantly improving inference stability and accuracy. MTI delivers consistent gains across diverse domains: +1.35% average improvement for Qwen3-8B-Base on general, coding, and STEM benchmarks; and +5% on AIME2024 for Qwen3-32B-Reasoning.

Technology Category

Application Category

📝 Abstract
Recent progress in large language models (LLMs) has focused on test-time scaling to improve reasoning via increased inference computation, but often at the cost of efficiency. We revisit test-time behavior and uncover a simple yet underexplored phenomenon: reasoning uncertainty is highly localized-only a small subset of high-entropy tokens dominantly affects output correctness. Motivated by this, we propose Minimal Test-Time Intervention (MTI), a training-free framework that enhances reasoning accuracy and stability with minimal overhead. MTI includes: (i) Selective CFG intervention, applying classifier-free guidance only at uncertain positions; and (ii) Lightweight negative-prompt guidance, reusing the main model's KV cache to approximate unconditional decoding efficiently. MTI yields consistent gains across general, coding, and STEM tasks-e.g., +1.35% average improvement on eight benchmarks for Qwen3-8B-Base and +5% on AIME2024 using Qwen3-32B-Reasoning-while remaining highly efficient.
Problem

Research questions and friction points this paper is trying to address.

Improving LLM reasoning with minimal test-time intervention
Addressing reasoning uncertainty through localized high-entropy tokens
Enhancing accuracy and efficiency without training overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective CFG intervention at uncertain positions
Lightweight negative-prompt guidance reusing KV cache
Training-free framework enhancing reasoning with minimal overhead