Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work aims to elucidate the empirical success and performance origins of intelligent theorem-proving systems on classical proof search problems. To this end, we model the proving process as a time-constrained Markov decision process and introduce the notion of “statistical provability,” which characterizes—through a distributional lens—the probability that a system generates a valid proof within a bounded number of steps. Leveraging the Bellman equation, we establish the existence of an optimal policy and, for the first time, quantify the performance gap of score-guided planning methods. Through analyses involving sub- and super-solution inequalities, metric entropy, doubling structures, and tail bounds on action gaps, our study not only provides a theoretical justification for the effectiveness of intelligent provers under realistic, biased distributions but also reveals their fundamental limitations in worst-case or adversarial scenarios.

Technology Category

Application Category

📝 Abstract

Agentic theorem provers -- pipelines that couple a mathematical reasoning model with library retrieval, subgoal-decomposition/search planner, and a proof assistant verifier -- have recently achieved striking empirical success, yet it remains unclear which components drive performance and why such systems work at all despite classical hardness of proof search. We propose a distributional viewpoint and introduce **statistical provability**, defined as the finite-horizon success probability of reaching a verified proof, averaged over an instance distribution, and formalize modern theorem-proving pipelines as time-bounded MDPs. Exploiting Bellman structure, we prove existence of optimal policies under mild regularity, derive provability certificates via sub-/super-solution inequalities, and bound the performance gap of score-guided planning (greedy/top-$k$/beam/rollouts) in terms of approximation error, sequential statistical complexity, representation geometry (metric entropy/doubling structure), and action-gap margin tails. Together, our theory provides a principled, component-sensitive explanation of when and why agentic theorem provers succeed on biased real-world problem distributions, while clarifying limitations in worst-case or adversarial regimes.

Problem

Research questions and friction points this paper is trying to address.

agentic theorem proving

statistical provability

mathematical reasoning

proof search

distributional performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

statistical provability

agentic theorem proving

time-bounded MDP