Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In mathematical competitions, constructive answer-generation problems require both creative candidate synthesis and rigorous formal verification—a task where large language models (LLMs) lack formal reasoning capability and symbolic provers struggle with efficient hypothesis generation. Method: We propose ECP, the first modular neuro-symbolic framework that synergistically integrates LLMs (e.g., gpt-4.1-mini) for pattern-driven enumeration and conjecture generation with Lean’s rigorous formal verification. Contribution/Results: We introduce ConstructiveBench, the first benchmark comprising 3,431 problems fully annotated with Lean proofs. On this benchmark, ECP achieves a 45.06% answer construction accuracy—significantly surpassing the chain-of-thought baseline (14.54%). When integrated with DeepSeek-Prover-V2-7B, ECP attains a 25.01% end-to-end provable rate, substantially outperforming purely symbolic approaches.

Technology Category

Application Category

📝 Abstract
Mathematical reasoning lies at the heart of artificial intelligence, underpinning applications in education, program verification, and research-level mathematical discovery. Mathematical competitions, in particular, present two challenging problem types: theorem-proving, requiring rigorous proofs of stated conclusions, and answer-construction, involving hypothesizing and formally verifying mathematical objects. Large Language Models (LLMs) effectively generate creative candidate answers but struggle with formal verification, while symbolic provers ensure rigor but cannot efficiently handle creative conjecture generation. We introduce the Enumerate-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving. We present ConstructiveBench, a dataset of 3,431 answer-construction problems in various math competitions with verified Lean formalizations. On the ConstructiveBench dataset, ECP improves the accuracy of answer construction from the Chain-of-Thought (CoT) baseline of 14.54% to 45.06% with the gpt-4.1-mini model. Moreover, combining with ECP's constructed answers, the state-of-the-art DeepSeek-Prover-V2-7B model generates correct proofs for 858 of the 3,431 constructive problems in Lean, achieving 25.01% accuracy, compared to 9.86% for symbolic-only baselines. Our code and dataset are publicly available at GitHub and HuggingFace, respectively.
Problem

Research questions and friction points this paper is trying to address.

Bridging LLMs and symbolic provers for math answer-construction
Improving formal verification of creative conjectures in math competitions
Enhancing accuracy in solving answer-construction problems via ECP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLM enumeration with symbolic proving
Uses pattern-driven conjecturing for creative solutions
Enhances Lean formalization accuracy significantly
🔎 Similar Papers
No similar papers found.