CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current approaches to translating natural language mathematical statements into formal Lean 4 code suffer from insufficient semantic fidelity. Method: This paper introduces CriticLeanGPT—a novel framework that elevates the critic module from passive verification to an active learning core. It jointly trains a semantic-correctness-aware critic model via supervised fine-tuning and reinforcement learning. A fine-grained semantic evaluation mechanism is designed, and we release FineLeanCorpus—the first high-diversity, high-quality formalization dataset (285K samples)—alongside CriticLeanBench, a dedicated evaluation benchmark. Contribution/Results: Experiments demonstrate that CriticLeanGPT significantly outperforms state-of-the-art open- and closed-source models on formalization correctness discrimination. The framework establishes a new paradigm for semantically reliable formalization in automated theorem proving.

Technology Category

Application Category

📝 Abstract
Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase-the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 285K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations, and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
Problem

Research questions and friction points this paper is trying to address.

Enhancing semantic fidelity in math formalization via critic-guided RL
Evaluating correctness of Lean 4 formalizations against original intent
Building diverse datasets for reliable automated theorem proving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Critic-guided reinforcement learning framework
CriticLeanGPT for semantic fidelity assessment
FineLeanCorpus dataset with 285K problems
🔎 Similar Papers
No similar papers found.