Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical bottleneck of scarce high-quality training data hindering the advancement of small language models’ programming and mathematical reasoning capabilities. We propose the Think, Prune, Train (TPT) framework: an iterative paradigm that (1) generates reasoning traces via self-instruction, (2) rigorously prunes synthetic data using ground-truth answer verification, and (3) applies lightweight multi-round fine-tuning—thereby establishing a closed-loop “reason–prune–train” pipeline. TPT introduces the first truth-driven self-purification mechanism for synthetic data, enabling substantial capability gains without increasing model parameter count. Experiments demonstrate significant improvements: Gemma2-2B achieves 57.6% Pass@1 on GSM8K (+15.7 absolute gain), while Gemma2-9B reaches 82.0%, matching LLaMA-3.1-70B; with further optimization, it attains 91.0%, surpassing GPT-4o.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated strong capabilities in programming and mathematical reasoning tasks, but are constrained by limited high-quality training data. Synthetic data can be leveraged to enhance fine-tuning outcomes, but several factors influence this process, including model size, synthetic data volume, pruning strategy, and number of fine-tuning rounds. We explore these axes and investigate which conditions enable model self-improvement. We introduce the Think, Prune, Train process, a scalable framework that iteratively fine-tunes models on their own reasoning traces, using ground-truth pruning to ensure high-quality training data. This approach yields improved performance: on GSM8K, Gemma2-2B achieves a Pass@1 of 57.6% (from 41.9%), Gemma2-9B reaches 82%, matching LLaMA-3.1-70B, and LLaMA-3.1-70B attains 91%, even surpassing GPT-4o, demonstrating the effectiveness of self-generated reasoning and systematic data selection for improving LLM capabilities.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning with limited high-quality training data
Optimizing synthetic data use for model self-improvement
Scaling reasoning performance without increasing model size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative fine-tuning on self-generated reasoning traces
Ground-truth pruning for high-quality synthetic data
Scalable framework enhancing small model performance
🔎 Similar Papers
No similar papers found.
C
Caia Costello
Department of Computer Science, Stanford University
Simon Guo
Simon Guo
Stanford University
Computer SystemsMachine Learning
A
Anna Goldie
Department of Computer Science, Stanford University
Azalia Mirhoseini
Azalia Mirhoseini
Assistant Professor of Computer Science, Stanford - Google DeepMind
AISystemsScaling