OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Current formal theorem provers excel at Olympiad-level mathematics but suffer performance degradation on undergraduate optimization problems—such as those involving convexity and optimality conditions—due to distributional shift. This work proposes a domain-specific transfer approach for optimization: leveraging a strong Olympiad-level prover, it constructs a large-scale optimization dataset via expert iteration and introduces a novel preference learning objective that combines perplexity weighting with penalties for non-progress steps to guide efficient proof search. Evaluated on a newly curated optimization benchmark, the method achieves state-of-the-art Pass@1 and Pass@32 performance among models of comparable scale, while maintaining competitive results on general theorem-proving tasks, thereby enabling effective cross-domain transfer without catastrophic forgetting.

Technology Category

Application Category

📝 Abstract

Recent advances in formal theorem proving have focused on Olympiad-level mathematics, leaving undergraduate domains largely unexplored. Optimization, fundamental to machine learning, operations research, and scientific computing, remains underserved by existing provers. Its reliance on domain-specific formalisms (convexity, optimality conditions, and algorithmic analysis) creates significant distribution shift, making naive domain transfer ineffective. We present OptProver, a trained model that achieves robust transfer from Olympiad to undergraduate optimization. Starting from a strong Olympiad-level prover, our pipeline mitigates distribution shift through two key innovations. First, we employ large-scale optimization-focused data curation via expert iteration. Second, we introduce a specialized preference learning objective that integrates perplexity-weighted optimization with a mechanism to penalize valid but non-progressing proof steps. This not only addresses distribution shifts but also guides the search toward efficient trajectories. To enable rigorous evaluation, we construct a novel benchmark in Lean 4 focused on optimization. On this benchmark, OptProver achieves state-of-the-art Pass@1 and Pass@32 among comparably sized models while maintaining competitive performance on general theorem-proving tasks, demonstrating effective domain transfer without catastrophic forgetting.

Problem

Research questions and friction points this paper is trying to address.

formal theorem proving

optimization

distribution shift

domain transfer

undergraduate mathematics

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual training

distribution shift mitigation

preference learning