Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

πŸ“… 2026-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency of current large language models in mathematical reasoning, which often rely on massive datasets, and the suboptimal sample utilization caused by traditional unidirectional curriculum learning that blindly escalates problem difficulty. To overcome these limitations, the authors propose a multi-agent framework featuring a bidirectional curriculum generation mechanism grounded in the optimal pacing theorem. This mechanism dynamically modulates problem difficulty, adaptively alternating between challenging the model’s capabilities and repairing its reasoning flaws. A closed-loop feedback system is integrated through diagnostic analysis of reasoning failures and targeted simplification strategies. The approach substantially reduces the required training data while significantly outperforming existing baselines in reasoning performance, thereby demonstrating the efficacy of data-efficient mathematical reasoning.

Technology Category

Application Category

πŸ“ Abstract
Enhancing mathematical reasoning in Large Language Models typically demands massive datasets, yet data efficiency remains a critical bottleneck. While Curriculum Learning attempts to structure this process, standard unidirectional approaches (simple-to-complex) suffer from inefficient sample utilization: they blindly escalate complexity even when foundational gaps persist, leading to wasted computation on unsolvable problems. To maximize the instructional value of every training sample, we introduce a novel Bidirectional Curriculum Generation framework. Unlike rigid trajectories, our multi-agent ecosystem mimics adaptive pedagogy to establish a closed feedback loop. It dynamically generates data by either complicating problems to challenge the model or, crucially, simplying them to repair specific reasoning failures. This mechanism ensures that the model consumes only the most effective data at any given stage. Grounded in the Optimal Pacing Theorem, our approach optimizes the learning trajectory, significantly outperforming baselines while achieving superior reasoning performance with substantially fewer instruction samples.
Problem

Research questions and friction points this paper is trying to address.

mathematical reasoning
data efficiency
curriculum learning
sample utilization
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Curriculum Generation
Multi-Agent Framework
Data-Efficient Learning
Adaptive Pedagogy
Mathematical Reasoning
πŸ”Ž Similar Papers
No similar papers found.
B
Boren Hu
Zhejiang University
X
Xiao Liu
University of Macau
Boci Peng
Boci Peng
Peking University
GraphRAGLLMsKnowledge GraphsRecommendation
X
Xinping Zhao
Harbin Institute of Technology (Shenzhen)
X
Xiaoran Shang
Wuhan University
Y
Yun Zhu
Shanghai Artificial Intelligence Laboratory
Lijun Wu
Lijun Wu
Shanghai AI Laboratory
MLLLMAI4Science