Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Formal reasoning in university-level physics remains challenging due to the lack of rigorous, machine-verifiable frameworks. Method: We introduce the first systematic Lean4-based framework for physics formalization, comprising (1) LeanPhysBench—a benchmark of 200 problems spanning mechanics, electromagnetism, and other core domains, drawn from canonical textbooks and competitions; (2) PhysLib—an open-source physics knowledge library supporting unit-system modeling, foundational theorems, and an extensible axiomatization; and (3) a community-driven knowledge curation paradigm, integrated with closed-source large language models for automated theorem proving experiments. Contributions/Results: This work establishes the first university-physics–level Lean4 benchmark; PhysLib improves average proof accuracy of mainstream models by 11.75%; and our analysis identifies critical bottlenecks—including symbolic rigor, unit consistency, and multi-step causal chain modeling—demonstrating that structured domain knowledge is essential for robust physical reasoning.

Technology Category

Application Category

📝 Abstract

We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. **Lean4PHYS** includes *LeanPhysBench*, a college-level benchmark for formal physics reasoning in Lean4, which contains 200 hand-crafted and peer-reviewed statements derived from university textbooks and physics competition problems. To establish a solid foundation for formal reasoning in physics, we also introduce *PhysLib*, a community-driven repository containing fundamental unit systems and theorems essential for formal physics reasoning. Based on the benchmark and Lean4 repository we composed in **Lean4PHYS**, we report baseline results using major expert Math Lean4 provers and state-of-the-art closed-source models, with the best performance of DeepSeek-Prover-V2-7B achieving only 16% and Claude-Sonnet-4 achieving 35%. We also conduct a detailed analysis showing that our *PhysLib* can achieve an average improvement of 11.75% in model performance. This demonstrates the challenging nature of our *LeanPhysBench* and the effectiveness of *PhysLib*. To the best of our knowledge, this is the first study to provide a physics benchmark in Lean4.

Problem

Research questions and friction points this paper is trying to address.

Developing a formal reasoning framework for college physics problems

Creating benchmark and theorem repository for physics verification

Evaluating AI provers on challenging physics theorem proving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for college physics reasoning in Lean4

Benchmark with 200 hand-crafted physics statements

Repository with fundamental units and physics theorems

🔎 Similar Papers

A Semantic Search Engine for Mathlib4