Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the critical bottleneck of scarce high-quality training data hindering the advancement of small language models’ programming and mathematical reasoning capabilities. We propose the Think, Prune, Train (TPT) framework: an iterative paradigm that (1) generates reasoning traces via self-instruction, (2) rigorously prunes synthetic data using ground-truth answer verification, and (3) applies lightweight multi-round fine-tuning—thereby establishing a closed-loop “reason–prune–train” pipeline. TPT introduces the first truth-driven self-purification mechanism for synthetic data, enabling substantial capability gains without increasing model parameter count. Experiments demonstrate significant improvements: Gemma2-2B achieves 57.6% Pass@1 on GSM8K (+15.7 absolute gain), while Gemma2-9B reaches 82.0%, matching LLaMA-3.1-70B; with further optimization, it attains 91.0%, surpassing GPT-4o.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated strong capabilities in programming and mathematical reasoning tasks, but are constrained by limited high-quality training data. Synthetic data can be leveraged to enhance fine-tuning outcomes, but several factors influence this process, including model size, synthetic data volume, pruning strategy, and number of fine-tuning rounds. We explore these axes and investigate which conditions enable model self-improvement. We introduce the Think, Prune, Train process, a scalable framework that iteratively fine-tunes models on their own reasoning traces, using ground-truth pruning to ensure high-quality training data. This approach yields improved performance: on GSM8K, Gemma2-2B achieves a Pass@1 of 57.6% (from 41.9%), Gemma2-9B reaches 82%, matching LLaMA-3.1-70B, and LLaMA-3.1-70B attains 91%, even surpassing GPT-4o, demonstrating the effectiveness of self-generated reasoning and systematic data selection for improving LLM capabilities.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning with limited high-quality training data

Optimizing synthetic data use for model self-improvement

Scaling reasoning performance without increasing model size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative fine-tuning on self-generated reasoning traces

Ground-truth pruning for high-quality synthetic data

Scalable framework enhancing small model performance

🔎 Similar Papers

Rational Metareasoning for Large Language Models