IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

๐Ÿ“… 2025-11-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the automatic formalization of natural-language mathematical problems into the theorem-proving language Lean 4, targeting the critical gap between syntactic validity and semantic correctness to improve proof success rates. To this end, we introduce MathOlympiad-Lean4โ€”the first human-verified benchmark comprising 312 Indian Mathematical Olympiad problems with precise, expert-curated Lean 4 formalizations. We propose a human-AI collaborative formalization framework integrating category-based retrieval, multi-model ensemble generation, compilation-feedback-driven iterative refinement, and expert interaction for validation. We further develop an automated quality assessment dashboard. Experiments reveal that state-of-the-art models exhibit substantial deficiencies in semantic correctness and achieve low proof success rates, confirming the benchmarkโ€™s high difficulty. MathOlympiad-Lean4 thus establishes a rigorous, reproducible evaluation platform for advancing mathematical reasoning and automated formalization research.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. IndiMathBench is composed of 312 formal Lean 4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Through category-based retrieval, iterative compiler feedback, and multi-model ensembles, our pipeline generates candidate formalizations that experts efficiently validate via an interactive dashboard with automated quality summaries. Evaluation across multiple frontier models demonstrates that autoformalization remains challenging, with substantial gaps between syntactic validity and semantic correctness, while theorem proving success rates remain low even with iterative refinement, demonstrating that enchmark~presents a challenging testbed for mathematical reasoning. IndiMathBench is available at https://github.com/prmbiy/IndiMathBench.
Problem

Research questions and friction points this paper is trying to address.

Autoformalizes natural language math problems into Lean
Evaluates theorem proving via human-verified benchmark
Assesses semantic correctness gaps in autoformalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-human pipeline for formalizing natural language math problems
Iterative compiler feedback and multi-model ensembles for candidate generation
Expert validation via interactive dashboard with automated summaries
๐Ÿ”Ž Similar Papers
No similar papers found.