What are the Right Symmetries for Formal Theorem Proving?

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the sensitivity of large language model–driven formal theorem provers to semantically equivalent yet syntactically distinct problem formulations, which undermines proof success stability due to a lack of adherence to mathematical structural symmetries. The paper introduces, for the first time, the notion of rewriting categories from category theory to formally define two symmetry principles: proof covariance and success invariance. It proposes aggregating over equivalent rewrites of the input at test time to restore success invariance. Theoretical analysis demonstrates that this aggregation strategy rigorously guarantees invariance in the sampling limit, while empirical evaluation confirms that it significantly enhances both robustness and performance of theorem provers under a fixed inference budget.

📝 Abstract

Formal theorem provers based on large language models (LLMs) are highly sensitive to superficial variations in problem representation: semantically equivalent statements can exhibit drastically different proof success rates, revealing a failure to respect structural symmetries inherent in formal mathematics. This raises a central question: what are the right symmetries for formal theorem proving? We introduce rewriting categories, a category-theoretic framework capturing the compositional, generally non-invertible transformations induced by proof tactics, and use it to formalize two symmetry notions: proof equivariance, governing how proof distributions transform under rewrites, and success invariance (i.e., invariance of success probability), requiring equivalent statements to be solved with the same probability. We observe that state-based next-tactic provers naturally satisfy proof equivariance by operating on proof states. In contrast, state-of-the-art LLM-based provers satisfy neither property, exhibiting large performance variation across equivalent formulations. To mitigate this, we propose test-time methods that aggregate over equivalent rewritings of the input, showing theoretically that they recover success invariance in the sampling limit, and empirically, that they improve robustness and performance under fixed inference budgets. Our results highlight symmetry as a key missing inductive bias in LLM-based theorem proving and suggest test-time computation as a practical route to approximate it.

Problem

Research questions and friction points this paper is trying to address.

symmetry

formal theorem proving

large language models

proof invariance

rewriting

Innovation

Methods, ideas, or system contributions that make the work stand out.

rewriting categories

proof equivariance

success invariance

symmetry

test-time aggregation

🔎 Similar Papers

No similar papers found.

Amazon

Arlington, VA / Boston, MA / New York, NY

Authors to Follow