Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Large reasoning models (LRMs) with Mixture-of-Experts (MoE) architectures often suffer from suboptimal cognitive efficiency—manifesting as both overthinking and underthinking. To address this, we propose RICE (Reasoning-time Inference-level Cognitive Expert modulation), a training-free, inference-time cognitive expert regulation method. RICE identifies “cognitive experts”—specialized experts responsible for meta-level reasoning—using normalized Pointwise Mutual Information (nPMI) and dynamically amplifies their routing weights during inference to enable structured, goal-directed reasoning. Crucially, RICE requires no fine-tuning, prompt engineering, or additional parameters, ensuring interpretability and computational lightness. Extensive experiments on DeepSeek-R1 and Qwen3-235B demonstrate that RICE significantly improves accuracy on quantitative and scientific reasoning benchmarks, enhances cognitive efficiency, and strengthens cross-domain generalization—outperforming state-of-the-art prompting and decoding control methods.

Technology Category

Application Category

📝 Abstract

Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes. Despite notable advances, existing reasoning models often suffer from cognitive inefficiencies like overthinking and underthinking. To address these limitations, we introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning performance without additional training or complex heuristics. Leveraging normalized Pointwise Mutual Information (nPMI), we systematically identify specialized experts, termed ''cognitive experts'' that orchestrate meta-level reasoning operations characterized by tokens like ''''. Empirical evaluations with leading MoE-based LRMs (DeepSeek-R1 and Qwen3-235B) on rigorous quantitative and scientific reasoning benchmarks demonstrate noticeable and consistent improvements in reasoning accuracy, cognitive efficiency, and cross-domain generalization. Crucially, our lightweight approach substantially outperforms prevalent reasoning-steering techniques, such as prompt design and decoding constraints, while preserving the model's general instruction-following skills. These results highlight reinforcing cognitive experts as a promising, practical, and interpretable direction to enhance cognitive efficiency within advanced reasoning models.

Problem

Research questions and friction points this paper is trying to address.

Improving reasoning performance in MoE models without extra training

Addressing cognitive inefficiencies like overthinking and underthinking

Enhancing reasoning accuracy and cognitive efficiency with RICE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces RICE for inference-time cognitive steering

Uses nPMI to identify specialized cognitive experts

Improves reasoning accuracy without additional training

🔎 Similar Papers

Rational Metareasoning for Large Language Models