🤖 AI Summary
Formal verification of software suffers from high manual proof-writing costs, while direct integration of large language models (LLMs) incurs prohibitive computational overhead and lacks formal trustworthiness. Method: This paper proposes a novel LLM strategy extraction and formal migration framework that automatically distills implicit proof strategies from LLM-generated natural-language reasoning traces; these are then formalized via abstraction and agent-assisted error correction to produce reusable Coq lemmas. Contribution/Results: To our knowledge, this is the first end-to-end strategy distillation pipeline transferring LLM reasoning capabilities to symbolic provers—specifically Rocq and CoqHammer. Evaluated on the Rocq benchmark, our method improves CoqHammer’s theorem-proving success rate by 13.41%, substantially enhancing automation in formal verification. The approach establishes a new paradigm for synergistic verification, bridging LLM-based reasoning with rigorous, machine-checkable proofs.
📝 Abstract
One important approach to software verification is interactive theorem proving. However, writing formal proofs often requires substantial human effort, making proof automation highly important. Traditionally, proof automation has relied on symbolic provers. Recently, large language models (LLMs) have demonstrated strong capabilities in theorem proving, complementing symbolic provers. Nonetheless, prompting LLMs can be expensive and may pose security risks for confidential codebases. As a result, purely symbolic approaches remain important even in the LLM era, as they are cost-effective, secure, and complement the strengths of LLMs.
Motivated by these considerations, we ask a new research question: can we extract the internal strategies of LLMs to enhance the capabilities of symbolic provers? As an initial attempt to answer this question, we propose Strat2Rocq, which extracts proof strategies from LLMs and formalizes them as lemmas in Rocq. These lemmas are accessible to symbolic provers such as CoqHammer. With the addition of these LLM-extracted lemmas, CoqHammer is able to prove more theorems. The knowledge extraction process involves analyzing the proof trajectories of LLMs on a training set of proved theorems. For each theorem, we prompt the LLM to generate a natural language proof, then ask it to summarize this proof into formalized lemmas with proofs. We also employ a standard agentic approach to mitigate errors during formalization. Our evaluation demonstrates that, on open-source Rocq projects for software verification, Strat2Rocq enhances the success rate of CoqHammer by 13.41%.