How Far Are We from Optimal Reasoning Efficiency?

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chain-of-thought (CoT) reasoning in large reasoning models (LRMs) often yields redundant, excessively long inference chains, resulting in low efficiency and high computational cost; existing fine-tuning methods lack standardized evaluation metrics to reliably quantify efficiency gains. Method: We introduce the “reasoning efficiency frontier” concept and the unified “Reasoning Efficiency Gap” (REG) metric to systematically characterize the accuracy–inference-length trade-off deficiency. We further propose REO-RL, a reinforcement learning algorithm integrating exponential sparse token-budget sampling with numerical integration for global optimization approximation. Results: On Qwen3-4B/8B, REO-RL consistently reduces REG by ≥50% and approaches the efficiency frontier under 16K-token budgets while incurring <0.5% accuracy degradation. REG exhibits strong correlation with human evaluation (Spearman’s ρ > 0.92), validating its fidelity as an efficiency proxy.

Technology Category

Application Category

📝 Abstract
Large Reasoning Models (LRMs) demonstrate remarkable problem-solving capabilities through extended Chain-of-Thought (CoT) reasoning but often produce excessively verbose and redundant reasoning traces. This inefficiency incurs high inference costs and limits practical deployment. While existing fine-tuning methods aim to improve reasoning efficiency, assessing their efficiency gains remains challenging due to inconsistent evaluations. In this work, we introduce the reasoning efficiency frontiers, empirical upper bounds derived from fine-tuning base LRMs across diverse approaches and training configurations. Based on these frontiers, we propose the Reasoning Efficiency Gap (REG), a unified metric quantifying deviations of any fine-tuned LRMs from these frontiers. Systematic evaluation on challenging mathematical benchmarks reveals significant gaps in current methods: they either sacrifice accuracy for short length or still remain inefficient under tight token budgets. To reduce the efficiency gap, we propose REO-RL, a class of Reinforcement Learning algorithms that minimizes REG by targeting a sparse set of token budgets. Leveraging numerical integration over strategically selected budgets, REO-RL approximates the full efficiency objective with low error using a small set of token budgets. Through systematic benchmarking, we demonstrate that our efficiency metric, REG, effectively captures the accuracy-length trade-off, with low-REG methods reducing length while maintaining accuracy. Our approach, REO-RL, consistently reduces REG by>=50 across all evaluated LRMs and matching Qwen3-4B/8B efficiency frontiers under a 16K token budget with minimal accuracy loss. Ablation studies confirm the effectiveness of our exponential token budget strategy. Finally, our findings highlight that fine-tuning LRMs to perfectly align with the efficiency frontiers remains an open challenge.
Problem

Research questions and friction points this paper is trying to address.

Assessing reasoning efficiency gaps in Large Reasoning Models
Improving accuracy-length trade-offs in Chain-of-Thought reasoning
Reducing redundant reasoning traces while maintaining high accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces reasoning efficiency frontiers for LRMs
Proposes REG metric to quantify efficiency deviations
Develops REO-RL algorithm to minimize efficiency gaps
🔎 Similar Papers
No similar papers found.
Jiaxuan Gao
Jiaxuan Gao
Institute for Interdisciplinary Information Sciences, Tsinghua University
multi-agent reinforcement learninglarge language model
S
Shu Yan
Ant Research; Nanjing University
Q
Qixin Tan
IIIS, Tsinghua University; Ant Research
L
Lu Yang
IIIS, Tsinghua University; Ant Research
Shusheng Xu
Shusheng Xu
IIIS, Tsinghua University
Reinforcement learningNLPData mining
W
Wei Fu
IIIS, Tsinghua University; Ant Research
Zhiyu Mei
Zhiyu Mei
Tsinghua University
Computer Science
Kaifeng Lyu
Kaifeng Lyu
Tsinghua University
Y
Yi Wu
IIIS, Tsinghua University; Ant Research