IsaMini: Redesigned Isabelle Proof Lanugage for Machine Learning

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited capability of large language models (LLMs) in understanding and generating formal proofs—leading to high human and computational costs in formal verification—this paper introduces MiniLang, a lightweight, machine-learning-oriented proof language, deeply integrated into the Isabelle/HOL ecosystem. MiniLang features fine-grained syntactic design to enhance LLMs’ structural modeling of proofs and incorporates an enhanced Sledgehammer mechanism for improved automated reasoning guidance. Evaluated on the PISA benchmark, our approach achieves 69.1% pass@1—surpassing prior state-of-the-art pass@64—and 79.2% pass@8, significantly outperforming the current SOTA (71.0%). The core contribution is the first executable proof language framework explicitly optimized for LLMs, enabling synergistic improvements in both formal reasoning capability and proof generation efficiency.

Technology Category

Application Category

📝 Abstract
Neural Theorem Proving (NTP) employs deep learning methods, particularly Large Language Models (LLMs), to automate formal proofs in proof assistants. This approach holds promise for reducing the dramatic labor costs or computation costs required in proof engineering, which is fundamental to formal verification and other software engineering methods. The paper explores the potential of improving NTP by redesigning the proof language, given that LLMs' capabilities depend highly on representations. We introduce emph{MiniLang}, a redesigned proof language for Isabelle/HOL incorporating an improved version of Sledgehammer. Experiments show MiniLang benefits two fine-tuned LLMs by improving the success rate on the PISA benchmark by up to 29% in comparison to generation of Isar proof script. The success rate under one attempt (so-called emph{pass@1}) reaches 69.1%, exceeding the previous Baldur's pass@64 (65.7%); The pass@8 reaches 79.2%, exceeding the state-of-the-art on PISA (71.0%) achieved by Magnushammer.
Problem

Research questions and friction points this paper is trying to address.

Redesigning proof language for better machine learning performance
Reducing labor and computation costs in proof engineering
Improving success rates in automated theorem proving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Redesigned proof language for Isabelle/HOL
Incorporates improved Sledgehammer version
Boosts LLM success rates significantly
🔎 Similar Papers
No similar papers found.
Q
Qiyuan Xu
Nanyang Technological University, Singapore 639798
Renxi Wang
Renxi Wang
MBZUAI
Natural Language Processing
H
Haonan Li
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
D
David Sanan
Singapore Institute of Technology, Singapore 828608
Conrad Watt
Conrad Watt
Assistant Professor, Nanyang Technological University
Formal Verification