Reinforced Large Language Model is a formal theorem prover

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the poor generalization and severe exposure bias of large language models (LLMs) in formal theorem proving. Methodologically, it proposes a fine-tuning-free reinforcement learning framework wherein the LLM serves as a policy network, optimized via a PPO variant within formal environments such as Lean. The approach integrates policy rollouts with explicit comparison against an expected policy, jointly orchestrating policy sampling, reward modeling, and multi-step tactic generation—thereby circumventing the reliance on expert-annotated proof trajectories inherent to supervised fine-tuning. Its core contribution is the first introduction of explicit policy rollout versus expected policy comparison into LLM-based theorem proving training, which substantially mitigates exposure bias. Experiments on standard benchmarks demonstrate statistically significant improvements in proof success rate over supervised fine-tuning baselines, validating that reinforcement-driven policy search enhances both reasoning robustness and out-of-distribution generalization.

Technology Category

Application Category

📝 Abstract

To take advantage of Large Language Model in theorem formalization and proof, we propose a reinforcement learning framework to iteratively optimize the pretrained LLM by rolling out next tactics and comparing them with the expected ones. The experiment results show that it helps to achieve a higher accuracy compared with directly fine-tuned LLM.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement learning optimizes theorem proving

Large Language Model enhances formal theorem accuracy

Iterative tactic comparison improves proof outcomes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning optimizes pretrained LLM

Iterative tactic rollout enhances theorem proof

Higher accuracy than fine-tuned LLM

🔎 Similar Papers

No similar papers found.