🤖 AI Summary
To address low sample efficiency and rigid difficulty adaptation in large language models’ (LLMs) mathematical reasoning, this paper proposes a customized curriculum learning framework. Methodologically, it integrates curriculum learning, adaptive difficulty modeling, dynamic prompt engineering, supervised fine-tuning (SFT), and reinforcement learning (RL) into a unified optimization pipeline. Key contributions include: (1) the first model capability-driven adaptive difficulty assessment mechanism, which dynamically quantifies instance difficulty based on model performance; and (2) Guided Prompting—a novel dynamic prompt injection technique that strategically reduces cognitive load for high-difficulty samples while enabling their effective incorporation into training via feedback. Evaluated on five mainstream mathematical reasoning benchmarks, the framework consistently outperforms uniform sampling across both SFT and RL paradigms, achieving significant gains in sample utilization efficiency and final reasoning accuracy.
📝 Abstract
Large Language Models (LLMs) have achieved remarkable performance across various reasoning tasks, yet post-training is constrained by inefficient sample utilization and inflexible difficulty samples processing. To address these limitations, we propose Customized Curriculum Learning (CCL), a novel framework with two key innovations. First, we introduce model-adaptive difficulty definition that customizes curriculum datasets based on each model's individual capabilities rather than using predefined difficulty metrics. Second, we develop"Guided Prompting,"which dynamically reduces sample difficulty through strategic hints, enabling effective utilization of challenging samples that would otherwise degrade performance. Comprehensive experiments on supervised fine-tuning and reinforcement learning demonstrate that CCL significantly outperforms uniform training approaches across five mathematical reasoning benchmarks, confirming its effectiveness across both paradigms in enhancing sample utilization and model performance.