Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Large reasoning models (LRMs) suffer from “overthinking”—generating unnecessarily long reasoning chains even on simple tasks, degrading inference efficiency. Method: We propose the Adaptive Cognitive Policy Optimization (ACPO) framework, inspired by dual-process theory in cognitive science. ACPO explicitly models two reasoning modes—“fast thinking” (intuition) and “slow thinking” (logic)—via system-aware reasoning tokens, and introduces a dynamic system-switching mechanism jointly driven by online task-difficulty estimation and token-budget allocation. It jointly optimizes reasoning depth and breadth via reinforcement learning and supervised fine-tuning. Contribution/Results: ACPO achieves Pareto improvements in both accuracy and efficiency across diverse complex reasoning tasks—including mathematical and symbolic reasoning—reducing redundant reasoning tokens by 38% on average while maintaining or improving answer accuracy.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking, generating redundant content regardless of task difficulty. Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization (ACPO), a reinforcement learning framework that enables LRMs to achieve efficient reasoning through adaptive cognitive allocation and dynamic system switch. ACPO incorporates two key components: (1) introducing system-aware reasoning tokens to explicitly represent the thinking modes thereby making the model's cognitive process transparent, and (2) integrating online difficulty estimation and token length budget to guide adaptive system switch and reasoning during reinforcement learning. To this end, we propose a two-stage training strategy. The first stage begins with supervised fine-tuning to cold start the model, enabling it to generate reasoning paths with explicit thinking modes. In the second stage, we apply ACPO to further enhance adaptive system switch for difficulty-aware reasoning. Experimental results demonstrate that ACPO effectively reduces redundant reasoning while adaptively adjusting cognitive allocation based on task complexity, achieving efficient hybrid reasoning.

Problem

Research questions and friction points this paper is trying to address.

Reducing redundant reasoning in large language models

Adaptive cognitive allocation for task complexity

Dynamic system switch for efficient hybrid reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework ACPO for efficient reasoning

System-aware tokens represent transparent cognitive processes

Two-stage training with supervised fine-tuning and ACPO

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting