Rethinking Easy-to-Hard: Limits of Curriculum Learning in Post-Training for Deductive Reasoning

📅 2026-03-28

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study investigates the efficacy of curriculum learning in post-training large language models for deductive reasoning, specifically examining whether ordering tasks by reasoning complexity—rather than superficial features—yields advantages over random sampling. By constructing synthetic arithmetic and logical benchmarks with rigorously controlled reasoning difficulty, the authors systematically evaluate multiple curriculum scheduling strategies across several model families using both supervised fine-tuning (SFT) and reinforcement learning (RL). The experimental results indicate that training with an increasing-difficulty curriculum does not consistently outperform random sampling in key metrics such as accuracy and response length, thereby challenging the prevailing assumption that curriculum learning inherently benefits compositional generalization and suggesting its limited utility in deductive reasoning post-training.

Technology Category

Application Category

📝 Abstract

Curriculum learning (CL), motivated by the intuition that learning in increasing order of difficulty should ease generalization, is commonly adopted both in pre-training and post-training of large language models (LLMs). The intuition of CL is particularly compelling for compositional reasoning, where complex problems are built from elementary inference rules; however, the actual impact of CL on such tasks remains largely underexplored. We present a systematic empirical study of CL for post-training of LLMs, using synthetic arithmetic and logical benchmarks where difficulty is characterized by reasoning complexity rather than surface-level proxies. Surprisingly, across multiple model families and curriculum schedules, we find no robust advantage in difficulty-based sequencing over standard random sampling in either accuracy or response length. These findings persist across both supervised fine-tuning (SFT) and reinforcement learning (RL) methods. Our study suggests that, in the context of deductive reasoning, the specific ordering of training examples plays a negligible role in achieving compositional generalization, challenging the practical utility of curriculum-based post-training.

Problem

Research questions and friction points this paper is trying to address.

curriculum learning

deductive reasoning

compositional generalization

post-training

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Learning

Deductive Reasoning

Compositional Generalization