Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) suffer from excessive reasoning—generating unnecessarily long and inefficient inference paths—leading to computational waste and degraded performance, primarily due to their inability to dynamically adapt to optimal modular reasoning strategies. To address this, we propose the first dynamic inference-path optimization framework grounded in *thought-pattern segmentation*, innovatively integrating preference optimization (DPO) into reasoning-path selection. Our method employs attention-FLOPs-aware modeling, dynamic pattern recognition, and construction of pairwise preference data to automatically reinforce beneficial reasoning patterns while suppressing detrimental ones. Experiments demonstrate up to a 12% improvement in reasoning accuracy and a 15.6% error-correction rate; average token count decreases from 5,000 to 3,000, and attention FLOPs drop by 47%, achieving simultaneous gains in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
While recent success of large reasoning models (LRMs) significantly advanced LLMs' reasoning capability by optimizing the final answer accuracy using reinforcement learning, they may also drastically increase the output length due to overthinking, characterized by unnecessarily complex reasoning paths that waste computation and potentially degrade the performance. We hypothesize that such inefficiencies stem from LRMs' limited capability to dynamically select the proper modular reasoning strategies, termed thinking patterns at the right position. To investigate this hypothesis, we propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns, systematically identifying and promoting beneficial patterns that improve the answer while removing detrimental ones. Empirical analysis confirms that our optimized thinking paths yield more concise yet sufficiently informative trajectories, enhancing reasoning efficiency by reducing attention FLOPs by up to 47% while maintaining accuracy for originally correct responses. Moreover, a non-trivial portion of originally incorrect responses are transformed into correct ones, achieving a 15.6% accuracy improvement with reduced length. Motivated by the improvement brought by the optimized thinking paths, we apply a preference optimization technique supported by a pairwise dataset contrasting suboptimal and optimal reasoning paths. Experimental evaluations across multiple mathematical reasoning benchmarks reveal that our method notably reduces computational overhead while simultaneously improving reasoning accuracy, achieving up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
Problem

Research questions and friction points this paper is trying to address.

Optimizing thinking dynamics to reduce overthinking in large reasoning models
Dynamically selecting efficient reasoning patterns to improve accuracy and efficiency
Reducing computational overhead while maintaining or enhancing reasoning performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic optimization framework segments reasoning paths
Preference optimization contrasts suboptimal and optimal paths
Reduces computation while improving reasoning accuracy
🔎 Similar Papers
Sohyun An
Sohyun An
UCLA
SearchOptimizationGenerative Models
R
Ruochen Wang
University of California, Los Angeles
T
Tianyi Zhou
University of Maryland, College Park
Cho-Jui Hsieh
Cho-Jui Hsieh
University of California, Los Angeles
Machine LearningOptimization