WarriorMath: Enhancing the Mathematical Ability of Large Language Models with a Defect-aware Framework

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing mathematical data augmentation methods overlook LLMs’ specific failure modes, yielding synthetically generated problems that lack diagnostic precision and yield marginal performance gains. To address this, we propose a deficiency-aware collaborative training framework: first, we perform fine-grained failure analysis to identify model weaknesses in mathematical reasoning; second, multiple expert LLMs collaboratively generate, critique, and iteratively refine high-difficulty, weakness-targeted problems; third, we apply progressive fine-tuning to strengthen deficient capabilities. Evaluated on six mainstream mathematical benchmarks, our method achieves an average 12.57% improvement over strong baselines and sets new state-of-the-art performance. Our core contribution lies in unifying failure diagnostics, defect-driven data synthesis, and progressive learning into a single, interpretable, and scalable paradigm for enhancing LLMs’ mathematical reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel in solving mathematical problems, yet their performance is often limited by the availability of high-quality, diverse training data. Existing methods focus on augmenting datasets through rephrasing or difficulty progression but overlook the specific failure modes of LLMs. This results in synthetic questions that the model can already solve, providing minimal performance gains. To address this, we propose WarriorMath, a defect-aware framework for mathematical problem solving that integrates both targeted data synthesis and progressive training. In the synthesis stage, we employ multiple expert LLMs in a collaborative process to generate, critique, and refine problems. Questions that base LLMs fail to solve are identified and iteratively improved through expert-level feedback, producing high-quality, defect-aware training data. In the training stage, we introduce a progressive learning framework that iteratively fine-tunes the model using increasingly challenging data tailored to its weaknesses. Experiments on six mathematical benchmarks show that WarriorMath outperforms strong baselines by 12.57% on average, setting a new state-of-the-art. Our results demonstrate the effectiveness of a defect-aware, multi-expert framework for improving mathematical ability.

Problem

Research questions and friction points this paper is trying to address.

Addressing limited performance gains from synthetic math questions

Overlooking specific failure modes in LLM math problem-solving

Improving mathematical ability with defect-aware, multi-expert framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defect-aware framework for targeted data synthesis

Collaborative expert LLMs for problem refinement

Progressive learning with difficulty-adjusted fine-tuning

🔎 Similar Papers

No similar papers found.