🤖 AI Summary
To address the limited capability of large language models (LLMs) in understanding and generating formal proofs—leading to high human and computational costs in formal verification—this paper introduces MiniLang, a lightweight, machine-learning-oriented proof language, deeply integrated into the Isabelle/HOL ecosystem. MiniLang features fine-grained syntactic design to enhance LLMs’ structural modeling of proofs and incorporates an enhanced Sledgehammer mechanism for improved automated reasoning guidance. Evaluated on the PISA benchmark, our approach achieves 69.1% pass@1—surpassing prior state-of-the-art pass@64—and 79.2% pass@8, significantly outperforming the current SOTA (71.0%). The core contribution is the first executable proof language framework explicitly optimized for LLMs, enabling synergistic improvements in both formal reasoning capability and proof generation efficiency.
📝 Abstract
Neural Theorem Proving (NTP) employs deep learning methods, particularly Large Language Models (LLMs), to automate formal proofs in proof assistants. This approach holds promise for reducing the dramatic labor costs or computation costs required in proof engineering, which is fundamental to formal verification and other software engineering methods. The paper explores the potential of improving NTP by redesigning the proof language, given that LLMs' capabilities depend highly on representations. We introduce emph{MiniLang}, a redesigned proof language for Isabelle/HOL incorporating an improved version of Sledgehammer. Experiments show MiniLang benefits two fine-tuned LLMs by improving the success rate on the PISA benchmark by up to 29% in comparison to generation of Isar proof script. The success rate under one attempt (so-called emph{pass@1}) reaches 69.1%, exceeding the previous Baldur's pass@64 (65.7%); The pass@8 reaches 79.2%, exceeding the state-of-the-art on PISA (71.0%) achieved by Magnushammer.