Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast

📅 2024-11-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) can autonomously generate separation logic specifications for C programs that are verifiable by VeriFast, thereby improving the efficiency and practicality of static program verification. We conduct the first systematic evaluation of GPT-4o’s capability to produce ownership-aware, verifiable separation logic assertions—using both zero-shot and chain-of-thought prompting—across diverse C program inputs. Results show that while generated specifications generally preserve functional fidelity, only a small fraction succeed in VeriFast verification; those that do often contain redundant assertions, yielding low overall verification success rates. Our key contribution is exposing a critical disconnect between *semantic correctness* and *formal verifiability* in LLM-generated logical specifications. Furthermore, we establish the first empirical benchmark and failure-mode analysis for LLM-based program verification targeting separation logic, providing foundational insights for future research at the intersection of LLMs and formal methods.

Technology Category

Application Category

📝 Abstract
Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores OpenAI's GPT-4o model's effectiveness in generating specifications on C programs that are verifiable with VeriFast, a separation logic based static verifier. Our experiment employs three different types of user inputs as well as basic and Chain-of-Thought (CoT) prompting to assess GPT's capabilities. Our results indicate that the specifications generated by GPT-4o preserve functional behavior, but struggle to be verifiable. When the specifications are verifiable they contain redundancies. Future directions are discussed to improve the performance.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Separation Logic
Static Verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Separation Logic
Automated Software Verification
🔎 Similar Papers
No similar papers found.
M
Marilyn Rego
Purdue University, West Lafayette, IN, USA
Wen Fan
Wen Fan
University of California, Berkeley
Nanotechnology - Vanadium dioxide - 2D materials
X
Xin Hu
University of Michigan - Ann Arbor, Ann Arbor, MI, USA
S
Sanya Dod
Purdue University, West Lafayette, IN, USA
Z
Zhaorui Ni
Purdue University, West Lafayette, IN, USA
Danning Xie
Danning Xie
Purdue University
software engineering
Jenna DiVincenzo
Jenna DiVincenzo
Purdue University, West Lafayette, IN, USA
Lin Tan
Lin Tan
Mary J. Elmore New Frontiers Professor, Computer Science, Purdue University
LLM4CodeSoftware reliabilityAIText analyticsAutoformalization