Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch

📅 2024-10-24
📈 Citations: 10
Influential: 2
📄 PDF
🤖 AI Summary
To address the scarcity of large-scale, high-quality mathematical reasoning data in open-source communities, this paper introduces ScaleQuest—a framework that, for the first time, enables fully seed-free, closed-model-free generation of diverse, high-fidelity math problem-answer pairs using only a lightweight 7B language model. Its core innovation lies in a two-stage problem tuning mechanism: Question Fine-Tuning (QFT) and Question Preference Optimization (QPO), integrated with autoregressive generation and rule- and model-guided diversity control. Leveraging ScaleQuest, we construct ScaleMath—a million-sample mathematical reasoning dataset. ScaleMath consistently outperforms existing open-source datasets (e.g., MetaMath, PRM800K) on both in-domain and cross-domain evaluations, and significantly boosts downstream performance—particularly in code reasoning tasks. Empirical results demonstrate that scaling data volume yields consistent, stable gains in model capability.

Technology Category

Application Category

📝 Abstract
Improving the mathematical reasoning capabilities of Large Language Models (LLMs) is critical for advancing artificial intelligence. However, access to extensive, diverse, and high-quality reasoning datasets remains a significant challenge, particularly for the open-source community. In this paper, we propose ScaleQuest, a novel, scalable, and cost-effective data synthesis method that enables the generation of large-scale mathematical reasoning datasets using lightweight 7B-scale models. ScaleQuest introduces a two-stage question-tuning process comprising Question Fine-Tuning (QFT) and Question Preference Optimization (QPO) to unlock the question generation capabilities of problem-solving models. By generating diverse questions from scratch -- without relying on powerful proprietary models or seed data -- we produce a dataset of 1 million problem-solution pairs. Our experiments demonstrate that models trained on our data outperform existing open-source datasets in both in-domain and out-of-domain evaluations. Furthermore, our approach shows continued performance improvement as the volume of training data increases, highlighting its potential for ongoing data scaling. The extensive improvements observed in code reasoning tasks demonstrate the generalization capabilities of our proposed method. Our work provides the open-source community with a practical solution to enhance the mathematical reasoning abilities of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM mathematical reasoning with scalable data synthesis
Generating diverse math questions without proprietary models or seed data
Improving open-source LLM performance via large-scale question-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable data synthesis using lightweight 7B models
Two-stage question-tuning: QFT and QPO
Generates diverse questions without seed data
🔎 Similar Papers
No similar papers found.