Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Formal reasoning in university-level physics remains challenging due to the lack of rigorous, machine-verifiable frameworks. Method: We introduce the first systematic Lean4-based framework for physics formalization, comprising (1) LeanPhysBench—a benchmark of 200 problems spanning mechanics, electromagnetism, and other core domains, drawn from canonical textbooks and competitions; (2) PhysLib—an open-source physics knowledge library supporting unit-system modeling, foundational theorems, and an extensible axiomatization; and (3) a community-driven knowledge curation paradigm, integrated with closed-source large language models for automated theorem proving experiments. Contributions/Results: This work establishes the first university-physics–level Lean4 benchmark; PhysLib improves average proof accuracy of mainstream models by 11.75%; and our analysis identifies critical bottlenecks—including symbolic rigor, unit consistency, and multi-step causal chain modeling—demonstrating that structured domain knowledge is essential for robust physical reasoning.

Technology Category

Application Category

📝 Abstract
We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. **Lean4PHYS** includes *LeanPhysBench*, a college-level benchmark for formal physics reasoning in Lean4, which contains 200 hand-crafted and peer-reviewed statements derived from university textbooks and physics competition problems. To establish a solid foundation for formal reasoning in physics, we also introduce *PhysLib*, a community-driven repository containing fundamental unit systems and theorems essential for formal physics reasoning. Based on the benchmark and Lean4 repository we composed in **Lean4PHYS**, we report baseline results using major expert Math Lean4 provers and state-of-the-art closed-source models, with the best performance of DeepSeek-Prover-V2-7B achieving only 16% and Claude-Sonnet-4 achieving 35%. We also conduct a detailed analysis showing that our *PhysLib* can achieve an average improvement of 11.75% in model performance. This demonstrates the challenging nature of our *LeanPhysBench* and the effectiveness of *PhysLib*. To the best of our knowledge, this is the first study to provide a physics benchmark in Lean4.
Problem

Research questions and friction points this paper is trying to address.

Developing a formal reasoning framework for college physics problems
Creating benchmark and theorem repository for physics verification
Evaluating AI provers on challenging physics theorem proving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for college physics reasoning in Lean4
Benchmark with 200 hand-crafted physics statements
Repository with fundamental units and physics theorems
🔎 Similar Papers
No similar papers found.
Y
Yuxin Li
Hong Kong University of Science and Technology
M
Minghao Liu
Hong Kong University of Science and Technology
R
Ruida Wang
University of Illinois Urbana-Champaign
W
Wenzhao Ji
Hong Kong University of Science and Technology
Zhitao He
Zhitao He
Hong Kong University of Science and Technology
Language ModelLanguage AgentMultimodal
R
Rui Pan
University of Illinois Urbana-Champaign
J
Junming Huang
Princeton University
T
Tong Zhang
University of Illinois Urbana-Champaign
Y
Yi R. Fung
Hong Kong University of Science and Technology