AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

In reinforcement learning–based post-training of language models, “slow thinking”—characterized by redundant computation on simple tasks and premature termination on complex ones—severely degrades reasoning efficiency. Method: This paper proposes AdapThink, an adaptive thinking preference framework that (1) introduces a group-relative reward function grounded in model confidence and response characteristics to dynamically regulate reasoning length, and (2) employs an entropy-guided, diversity-aware sampling mechanism to jointly optimize accuracy and reasoning-path diversity. Results: Evaluated across multiple mathematical reasoning benchmarks, AdapThink significantly enhances the adaptive reasoning capability of DeepSeek-distilled models: it reduces average inference steps by 32%, improves solution stability on complex problems by 19.7%, and maintains—or even slightly improves—overall accuracy.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mechanisms typically rely on static length budgets or predefined rules, lacking the adaptability for varying question complexities and models' evolving capabilities. To this end, we propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models. Specifically, AdapThink incorporates two key mechanisms: 1) A group-relative reward function that leverages model confidence and response's characteristic to dynamically adjust the preference of reflection-related transition words without resorting to a fixed length preference. 2) A diversity-aware sampling mechanism that balances the training group's solution accuracy with reasoning diversity via an entropy-guided score. Experiments on several mathematical reasoning datasets with DeepSeek-distilled models demonstrate AdapThink's advantages in enabling adaptive reasoning patterns and mitigating the inefficiencies.

Problem

Research questions and friction points this paper is trying to address.

Improving reasoning efficiency in language models

Adapting to varying question complexities dynamically

Balancing solution accuracy with reasoning diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic reward function adjusts reflection preferences

Diversity-aware sampling balances accuracy and diversity

Adaptive framework enhances reasoning efficiency

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting