Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in automated theorem proving for mathematics and quantum physics: the tension between creative reasoning and formal correctness, poor cross-domain generalization, and the absence of effective human–AI collaboration mechanisms. We propose a tool-augmented multi-agent framework that synergistically integrates the high-level deductive reasoning capabilities of large language models (LLMs) with the rigorous formal verification provided by the Lean theorem prover, coordinated via the Model Context Protocol (MCP) to ensure semantically grounded knowledge invocation and syntactic fidelity. Unlike conventional single-task systems, our approach achieves generalized transfer across abstract algebra, quantum theory, and cryptography—the first such demonstration in automated reasoning. It attains state-of-the-art performance on public benchmarks (e.g., MiniF2F) and significantly outperforms prior methods on a novel Lean-specific benchmark. In practice, it successfully assisted domain experts in formalizing complex cryptographic theorems, validating both autonomous reasoning and interactive human–AI co-proving capabilities.

Technology Category

Application Category

📝 Abstract
We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperform them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.
Problem

Research questions and friction points this paper is trying to address.

Automated theorem proving across diverse scientific domains
Combining creative reasoning with formal syntactic rigor
Generalizable formal verification methodology for mathematics and physics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system using LLMs for theorem proving
Integrates Lean tools via Model Context Protocol
Generalizable methodology for formal verification
🔎 Similar Papers
No similar papers found.
M
Marco Del Tredici
Axiomatic AI
J
Jacob McCarran
Axiomatic AI
B
Benjamin Breen
Axiomatic AI
J
Javier Aspuru Mijares
Axiomatic AI
W
Weichen Winston Yin
Axiomatic AI
J
Jacob M. Taylor
Axiomatic AI
Frank Koppens
Frank Koppens
ICFO
Physics
Dirk Englund
Dirk Englund
Professor of Electrical Engineering and Computer Science, MIT
quantum informationmachine learningartificial intelligence