Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large reasoning models often enhance deliberative reasoning at the cost of reduced helpfulness (↓27%), increased harm (↑32% harmful outputs), and substantially higher inference overhead. This work presents the first systematic evaluation of the trade-off between reasoning depth and foundational capabilities across model families (DeepSeek, Qwen, LLaMA) and scales (7B–671B). We propose an adaptive reasoning paradigm enabling dynamic switching among zero-thought, few-thought, and summary-thought modes—thereby allocating computational resources precisely according to task characteristics. We further introduce a unified evaluation framework jointly measuring helpfulness, harmlessness, and inference cost. Experiments demonstrate that our approach maintains ≥92% reasoning quality while reducing latency and energy consumption by over 40%, offering a novel pathway toward efficient and safe reasoning.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Reasoning Models (LRMs), such as OpenAI's o1/o3 and DeepSeek-R1, have demonstrated remarkable performance in specialized reasoning tasks through human-like deliberative thinking and long chain-of-thought reasoning. However, our systematic evaluation across various model families (DeepSeek, Qwen, and LLaMA) and scales (7B to 671B) reveals that acquiring these deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs, including notable declines in helpfulness and harmlessness, alongside substantially increased inference costs. Importantly, we demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks. Our empirical insights underline the critical need for developing more versatile LRMs capable of dynamically allocating inference-time compute according to specific task characteristics.

Problem

Research questions and friction points this paper is trying to address.

Trade-offs between deliberative reasoning and foundational capabilities in LRMs

Decline in helpfulness and harmlessness due to advanced reasoning

Need for adaptive reasoning to optimize inference costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive reasoning modes reduce drawbacks

Dynamic compute allocation for tasks

Balancing deliberative and foundational capabilities

🔎 Similar Papers

No similar papers found.

Authors to Follow