TheoremForge: Scaling up Formal Data Synthesis with Low-Budget Agentic Workflow

📅 2026-01-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the scarcity of open-source formalized mathematical data, which stems from the high cost of agent-based workflows. To overcome this limitation, we propose a low-cost, scalable data synthesis framework that decomposes formalization into five subtasks: statement formalization, proof generation, premise selection, proof refinement, and proof sketching. We further introduce a decoupled extraction strategy that recovers valuable training signals from failed trajectories, substantially improving data utilization and output efficiency. Using a staged agent workflow based on the Gemini-3-Flash model, our approach achieves a 12.6% validation rate on a 2,000-problem benchmark—outperforming the 8.6% baseline—while reducing the average cost per successful trajectory to just \$0.481, yielding a 1.6× improvement in proof generation efficiency.

Technology Category

Application Category

📝 Abstract

The high cost of agentic workflows in formal mathematics hinders large-scale data synthesis, exacerbating the scarcity of open-source corpora. To address this, we introduce \textbf{TheoremForge}, a cost-effective formal data synthesis pipeline that decomposes the formalization process into five sub-tasks, which are \textit{statement formalization}, \textit{proof generation}, \textit{premise selection}, \textit{proof correction} and \textit{proof sketching}. By implementing a \textit{Decoupled Extraction Strategy}, the workflow recovers valid training signals from globally failed trajectories, effectively utilizing wasted computation. Experiments on a 2,000-problem benchmark demonstrate that TheoremForge achieves a Verified Rate of 12.6\%, surpassing the 8.6\% baseline, at an average cost of only \textbf{\$0.481} per successful trajectory using Gemini-3-Flash. Crucially, our strategy increases data yield by \textbf{1.6$\times$} for proof generation compared to standard filtering. These results establish TheoremForge as a scalable framework for constructing a data flywheel to train future expert models. Our code is available \href{https://github.com/timechess/TheoremForge}{here}.

Problem

Research questions and friction points this paper is trying to address.

formal mathematics

data synthesis

agentic workflow

open-source corpora

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

formal data synthesis

agentic workflow

decoupled extraction strategy