Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Chain-of-thought (CoT) reasoning often incurs redundant and inefficient inference on simple problems. Method: This paper proposes a difficulty-aware dynamic reasoning framework that requires no architectural modification. Leveraging a post-training strategy combining supervised fine-tuning (SFT) and direct preference optimization (DPO), the model learns to autonomously adjust CoT length according to problem complexity—generating concise reasoning for simple problems and deeper derivations for complex ones. Contribution/Results: Our approach introduces the first purely data-driven, learnable mechanism for controlling reasoning paths. Experiments demonstrate that the model maintains or improves reasoning accuracy while significantly reducing average output length and computational cost, thereby achieving an efficient balance between “on-demand reasoning” and “proportional thinking.”

Technology Category

Application Category

📝 Abstract

Chain-of-thought reasoning, while powerful, can produce unnecessarily verbose output for simpler problems. We present a framework for difficulty-aware reasoning that teaches models to dynamically adjust reasoning depth based on problem complexity. Remarkably, we show that models can be endowed with such dynamic inference pathways without any architectural modifications; we simply post-train on data that is carefully curated to include chain-of-thought traces that are proportional in length to problem difficulty. Our analysis reveals that post-training via supervised fine-tuning (SFT) primarily captures patterns like reasoning length and format, while direct preference optimization (DPO) preserves reasoning accuracy, with their combination reducing length and maintaining or improving performance. Both quantitative metrics and qualitative assessments confirm that models can learn to "think proportionally", reasoning minimally on simple problems while maintaining depth for complex ones.

Problem

Research questions and friction points this paper is trying to address.

Dynamic reasoning depth adjustment based on problem complexity

Reducing unnecessary verbose output in chain-of-thought reasoning

Maintaining reasoning accuracy while minimizing response length

Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-aware reasoning depth adjustment

Post-training with curated proportional CoT data

Combining SFT and DPO for efficiency

🔎 Similar Papers

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models