TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the high computational cost and inefficiency of large language models in complex reasoning tasks, often caused by excessively long chains of thought. To this end, the authors propose TACLer, a novel framework that integrates capability-aware progressive curriculum reinforcement learning with a hybrid Thinking/NoThinking inference paradigm. The former dynamically adjusts the complexity of training data based on model proficiency, while the latter adaptively selects between reasoning and non-reasoning strategies during inference. Evaluated on four mathematical reasoning benchmarks, TACLer substantially outperforms existing methods, reducing training compute by over 50%, decreasing inference token consumption by more than 42%, and improving accuracy by over 9%, thereby achieving a synergistic optimization of both efficiency and performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement learning (RL) training, while often leading to overthinking with redundant intermediate steps. To improve learning and reasoning efficiency, while preserving or even enhancing performance, we propose TACLer, a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training. TACLer features two core components: (i) tailored curriculum learning that determines what knowledge the model lacks and needs to learn in progressive stages; (ii) a hybrid Thinking/NoThinking reasoning paradigm that balances accuracy and efficiency by enabling or disabling the Thinking mode. Our experiments show that TACLer yields a twofold advantage in learning and reasoning: (i) it reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model; and (ii) it improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets with complex problems.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought Reasoning

Reinforcement Learning

Reasoning Efficiency

Overthinking

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Reinforcement Learning

Tailored Curriculum

Hybrid Thinking Paradigm