PROMISE: Proof Automation as Structural Imitation of Human Reasoning

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated generation of formal proofs still faces significant challenges in modeling deep structural dependencies when tackling large-scale interactive theorem proving tasks. This work reframes proof generation as a stateful search process grounded in proof-state transitions and introduces a structure-aware proof mining mechanism. By jointly leveraging large language models, structural embeddings, and iterative search, the approach effectively captures structural dependencies among proof strategies, overcoming the limitations of conventional single-pass generation or shallow retrieval methods. Evaluated on the seL4 benchmark, the proposed method achieves up to a 26-percentage-point absolute improvement (186% relative gain) over existing systems such as Selene and Rango, while demonstrating robust performance across diverse large language model backends.
📝 Abstract
Automated proof generation for formal software verification remains largely unresolved despite advances in large language models (LLMs). While LLMs perform well in NLP, vision, and code generation, formal verification still requires substantial human effort. Interactive theorem proving (ITP) demands manual proof construction under strict logical constraints, limiting scalability; for example, verifying the seL4 microkernel required decades of effort. Existing LLM-based approaches attempt to automate this process but remain limited. Most rely on single-shot generation or shallow retrieval, which may work for small proofs but fail to scale to large, interdependent verification tasks with deep structural dependencies. We present PROMISE (PROof MIning via Structural Embeddings), a structure-aware framework that reframes proof generation as a stateful search over proof-state transitions. Instead of surface-level retrieval, PROMISE mines structural patterns from proof states and tactic transitions, enabling retrieval and adaptation of compatible proof fragments during iterative search. We evaluate PROMISE on the seL4 benchmark across multiple LLM backends and compare it with prior systems such as Selene and Rango. PROMISE consistently outperforms prior methods, achieving up to +26 point improvements (186% relative gain) while maintaining robustness across models, demonstrating the effectiveness of structure-aware proof mining for scalable theorem proving.
Problem

Research questions and friction points this paper is trying to address.

automated theorem proving
formal verification
interactive theorem proving
proof automation
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

structure-aware proof mining
proof-state transitions
automated theorem proving
formal verification
large language models
🔎 Similar Papers
No similar papers found.