๐ค AI Summary
This work addresses the automatic formalization of natural-language mathematical problems into the theorem-proving language Lean 4, targeting the critical gap between syntactic validity and semantic correctness to improve proof success rates. To this end, we introduce MathOlympiad-Lean4โthe first human-verified benchmark comprising 312 Indian Mathematical Olympiad problems with precise, expert-curated Lean 4 formalizations. We propose a human-AI collaborative formalization framework integrating category-based retrieval, multi-model ensemble generation, compilation-feedback-driven iterative refinement, and expert interaction for validation. We further develop an automated quality assessment dashboard. Experiments reveal that state-of-the-art models exhibit substantial deficiencies in semantic correctness and achieve low proof success rates, confirming the benchmarkโs high difficulty. MathOlympiad-Lean4 thus establishes a rigorous, reproducible evaluation platform for advancing mathematical reasoning and automated formalization research.
๐ Abstract
We introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. IndiMathBench is composed of 312 formal Lean 4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Through category-based retrieval, iterative compiler feedback, and multi-model ensembles, our pipeline generates candidate formalizations that experts efficiently validate via an interactive dashboard with automated quality summaries. Evaluation across multiple frontier models demonstrates that autoformalization remains challenging, with substantial gaps between syntactic validity and semantic correctness, while theorem proving success rates remain low even with iterative refinement, demonstrating that enchmark~presents a challenging testbed for mathematical reasoning. IndiMathBench is available at https://github.com/prmbiy/IndiMathBench.