🤖 AI Summary
This work addresses the notable gap in formal automated theorem proving research, which has predominantly focused on mathematics and computer science while lacking systematic approaches for physics. To bridge this gap, the study proposes a novel framework that integrates conjecture generation with formal verification to construct PhysLeanData—the first physics-specific dataset for formal reasoning. Leveraging this dataset, the authors train PhysProver, a model based on DeepSeek-Prover-V2-7B, using reinforcement learning with verifiable rewards (RLVR). Remarkably, with only approximately 5K training samples, PhysProver achieves an average performance gain of 2.4% across multiple physics subdomains and demonstrates a 1.3% improvement on the MiniF2F-Test mathematical benchmark, thereby confirming the positive transfer and cross-domain generalization benefits of physics-informed training for broader mathematical reasoning capabilities.
📝 Abstract
The combination of verifiable languages and LLMs has significantly influenced both the mathematical and computer science communities because it provides a rigorous foundation for theorem proving. Recent advancements in the field provide foundation models and sophisticated agentic systems pushing the boundaries of formal mathematical reasoning to approach the natural language capability of LLMs. However, little attention has been given to the formal physics reasoning, which also heavily relies on similar problem-solving and theorem-proving frameworks. To solve this problem, this paper presents, to the best of our knowledge, the first approach to enhance formal theorem proving in the physics domain. We compose a dedicated dataset PhysLeanData for the task. It is composed of theorems sampled from PhysLean and data generated by a conjecture-based formal data generation pipeline. In the training pipeline, we leverage DeepSeek-Prover-V2-7B, a strong open-source mathematical theorem prover, and apply Reinforcement Learning with Verifiable Rewards (RLVR) to train our model PhysProver. Comprehensive experiments demonstrate that, using only $\sim$5K training samples, PhysProver achieves an overall 2.4\% improvement in multiple sub-domains. Furthermore, after formal physics training, we observe 1.3\% gains on the MiniF2F-Test benchmark, which indicates non-trivial generalization beyond physics domains and enhancement for formal math capability as well. The results highlight the effectiveness and efficiency of our approach, which provides a paradigm for extending formal provers outside mathematical domains. To foster further research, we will release both our dataset and model to the community.