IsaMini: Redesigned Isabelle Proof Lanugage for Machine Learning

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the limited capability of large language models (LLMs) in understanding and generating formal proofs—leading to high human and computational costs in formal verification—this paper introduces MiniLang, a lightweight, machine-learning-oriented proof language, deeply integrated into the Isabelle/HOL ecosystem. MiniLang features fine-grained syntactic design to enhance LLMs’ structural modeling of proofs and incorporates an enhanced Sledgehammer mechanism for improved automated reasoning guidance. Evaluated on the PISA benchmark, our approach achieves 69.1% pass@1—surpassing prior state-of-the-art pass@64—and 79.2% pass@8, significantly outperforming the current SOTA (71.0%). The core contribution is the first executable proof language framework explicitly optimized for LLMs, enabling synergistic improvements in both formal reasoning capability and proof generation efficiency.

Technology Category

Application Category

📝 Abstract

Neural Theorem Proving (NTP) employs deep learning methods, particularly Large Language Models (LLMs), to automate formal proofs in proof assistants. This approach holds promise for reducing the dramatic labor costs or computation costs required in proof engineering, which is fundamental to formal verification and other software engineering methods. The paper explores the potential of improving NTP by redesigning the proof language, given that LLMs' capabilities depend highly on representations. We introduce emph{MiniLang}, a redesigned proof language for Isabelle/HOL incorporating an improved version of Sledgehammer. Experiments show MiniLang benefits two fine-tuned LLMs by improving the success rate on the PISA benchmark by up to 29% in comparison to generation of Isar proof script. The success rate under one attempt (so-called emph{pass@1}) reaches 69.1%, exceeding the previous Baldur's pass@64 (65.7%); The pass@8 reaches 79.2%, exceeding the state-of-the-art on PISA (71.0%) achieved by Magnushammer.

Problem

Research questions and friction points this paper is trying to address.

Redesigning proof language for better machine learning performance

Reducing labor and computation costs in proof engineering

Improving success rates in automated theorem proving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Redesigned proof language for Isabelle/HOL

Incorporates improved Sledgehammer version

Boosts LLM success rates significantly

🔎 Similar Papers

No similar papers found.