Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the weak model reasoning capability and the disconnect between formal verification and mathematical intuition in formal theorem proving. We propose a reasoning-driven reinforcement learning (RL) paradigm for building large-scale formal reasoning models targeting Lean 4. Methodologically, we conduct large-scale RL training on Qwen2.5-72B, introduce the novel “formal reasoning mode” to structurally encode human problem-solving strategies, and apply knowledge distillation to obtain efficient lightweight models (1.5B/7B). Key contributions include: (1) establishing a strong positive scaling law between model size and performance in neural theorem provers; (2) the first organic integration of formal verification with informal mathematical intuition; (3) achieving 80.7% pass@8192 on miniF2F—setting a new SOTA—with superior pass@1 accuracy and strong computational scalability; and (4) open-sourcing the distilled models to advance community research.

Technology Category

Application Category

📝 Abstract
We introduce Kimina-Prover Preview, a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term extit{formal reasoning pattern}. This approach allows the model to emulate human problem-solving strategies in Lean, iteratively generating and refining proof steps. Kimina-Prover sets a new state-of-the-art on the miniF2F benchmark, reaching 80.7% with pass@8192. Beyond improved benchmark performance, our work yields several key insights: (1) Kimina-Prover exhibits high sample efficiency, delivering strong results even with minimal sampling (pass@1) and scaling effectively with computational budget, stemming from its unique reasoning pattern and RL training; (2) we demonstrate clear performance scaling with model size, a trend previously unobserved for neural theorem provers in formal mathematics; (3) the learned reasoning style, distinct from traditional search algorithms, shows potential to bridge the gap between formal verification and informal mathematical intuition. We open source distilled versions with 1.5B and 7B parameters of Kimina-Prover
Problem

Research questions and friction points this paper is trying to address.

Develops a large language model for formal theorem proving using reinforcement learning.
Introduces a structured reasoning pattern to emulate human proof generation in Lean 4.
Achieves state-of-the-art performance on the miniF2F benchmark with 80.7% accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning trains large language model
Formal reasoning pattern emulates human strategies
Scalable performance with increasing model size
🔎 Similar Papers
No similar papers found.
Haiming Wang
Haiming Wang
Professor at the School of Information Science and Engineering, Southeast University
Antenna & Radio FrequencyRadio PropagationNonlinear Wireless Communications
M
Mert Unsal
X
Xiaohan Lin
M
Mantas Baksys
J
Junqi Liu
M
Marco Dos Santos
Flood Sung
Flood Sung
Moonshot AI
Foundation ModelsLLM/VLMAgentReinforcement LearningMeta Learning
M
Marina Vinyes
Z
Zhenzhe Ying
Z
Zekai Zhu
Jianqiao Lu
Jianqiao Lu
Researcher at Seed Foundation Model team
Large language modelOnline Matching Theory
Bolton Bailey
Bolton Bailey
Ph.D. Candidate, University of Illinois, Urbana-Champaign
Cryptographic Proof SystemsFormal Methods
C
Chendong Song
C
Chenjun Xiao
Dehao Zhang
Dehao Zhang
University of Electronic Science and Technology of China
Spiking Neural Network
E
Ebony Zhang
H
Han Zhu
J
Jiawei Liu
Julien Michel
Julien Michel
Longhui Yu
Longhui Yu
Kimi & University of Toronto & Peking University
AI AlignmentLarge Language ModelAI4MathTrustworthy AIContinual Learning
L
L'eo Dreyfus-Schmidt
Lewis Tunstall
Lewis Tunstall
Hugging Face
machine learningdeep learningnatural language processingtheoretical physics
M
Moreira Machado
P
Pauline Bourigault
R
Ran Wang
S
Stanislas Polu
T
Thibaut Barroyer
Wen-Ding Li
Wen-Ding Li
Cornell University
Machine Learning
Y
Yazhe Niu
Y
Yann Fleureau
Y
Yangyang Hu
Zhouliang Yu
Zhouliang Yu
The SphereLab, CUHK
Reinforcement LearningLLMFormal AI
Z
Zihan Wang
Zhilin Yang
Zhilin Yang
Carnegie Mellon University
Deep LearningMachine LearningNatural Language Processing
Z
Zhengying Liu
J
Jia Li