Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

High-quality, diverse synthetic mathematics data generation typically relies on teacher models or human annotations, posing scalability and cost bottlenecks. Method: This paper introduces SPARQ—a framework that leverages a single large language model (LLM) to autonomously generate and jointly optimize problem-answer pairs, using solve-rate as an unsupervised, quantitative proxy for problem difficulty. Contributions/Results: SPARQ pioneers (i) an unsupervised quality evaluation mechanism grounded in solve-rate; (ii) a Quality-Diversity (QD) co-optimization algorithm; (iii) a hierarchical fine-tuning strategy; and (iv) ablation-driven generalized attribution analysis. From 7.5K seed problems, SPARQ synthesizes over 20 million high-fidelity samples. Fine-tuning models on this data yields up to 24% performance gains across multiple in-distribution (ID) and out-of-distribution (OOD) benchmarks, markedly improving OOD robustness. Crucially, SPARQ provides the first empirical validation of scaling laws for synthetic mathematics data.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) driven synthetic data generation has emerged as a powerful method for improving model reasoning capabilities. However, most methods either distill large state-of-the-art models into small students or use natural ground-truth problem statements to guarantee problem statement quality. This limits the scalability of these approaches to more complex and diverse problem domains. To address this, we present SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms, a novel approach for generating high-quality and diverse synthetic math problem and solution pairs using only a single model by measuring a problem's solve-rate: a proxy for problem difficulty. Starting from a seed dataset of 7.5K samples, we generate over 20 million new problem-solution pairs. We show that filtering the generated data by difficulty and then fine-tuning the same model on the resulting data improves relative model performance by up to 24%. Additionally, we conduct ablations studying the impact of synthetic data quantity, quality and diversity on model generalization. We find that higher quality, as measured by problem difficulty, facilitates better in-distribution performance. Further, while generating diverse synthetic data does not as strongly benefit in-distribution performance, filtering for more diverse data facilitates more robust OOD generalization. We also confirm the existence of model and data scaling laws for synthetically generated problems, which positively benefit downstream model generalization.

Problem

Research questions and friction points this paper is trying to address.

Generating diverse synthetic math problems via quality-diversity algorithms

Improving model reasoning by filtering synthetic data for difficulty and diversity

Scaling synthetic problem generation to enhance model generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Quality-Diversity Algorithms for synthetic problem generation

Generates diverse math problems via solve-rate difficulty proxy

Filters synthetic data by difficulty and diversity for performance

🔎 Similar Papers

Behaviour Planning: A Toolkit for Diverse Planning

2024-05-07arXiv.orgCitations: 0

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Authors to Follow