CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Current approaches to translating natural language mathematical statements into formal Lean 4 code suffer from insufficient semantic fidelity. Method: This paper introduces CriticLeanGPT—a novel framework that elevates the critic module from passive verification to an active learning core. It jointly trains a semantic-correctness-aware critic model via supervised fine-tuning and reinforcement learning. A fine-grained semantic evaluation mechanism is designed, and we release FineLeanCorpus—the first high-diversity, high-quality formalization dataset (285K samples)—alongside CriticLeanBench, a dedicated evaluation benchmark. Contribution/Results: Experiments demonstrate that CriticLeanGPT significantly outperforms state-of-the-art open- and closed-source models on formalization correctness discrimination. The framework establishes a new paradigm for semantically reliable formalization in automated theorem proving.

Technology Category

Application Category

📝 Abstract

Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase-the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 285K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations, and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.

Problem

Research questions and friction points this paper is trying to address.

Enhancing semantic fidelity in math formalization via critic-guided RL

Evaluating correctness of Lean 4 formalizations against original intent

Building diverse datasets for reliable automated theorem proving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Critic-guided reinforcement learning framework

CriticLeanGPT for semantic fidelity assessment

FineLeanCorpus dataset with 285K problems

🔎 Similar Papers

No similar papers found.