Can ChatGPT support software verification?

📅 2023-11-04
🏛️ Fundamental Approaches to Software Engineering
📈 Citations: 15
Influential: 0
📄 PDF
🤖 AI Summary
Automatically generating loop invariants for formal software verification remains highly challenging, as conventional approaches heavily rely on domain expertise and manual effort. Method: This work systematically evaluates the capability of large language models (LLMs), specifically ChatGPT, to synthesize loop invariants for C programs, proposing a novel collaborative paradigm wherein an LLM interfaces with formal verifiers (Frama-C and CPAchecker) via prompt engineering, static program analysis, and ACSL specification modeling. Contribution/Results: Evaluated on 106 real-world C programs, ChatGPT successfully generates valid invariants that significantly improve Frama-C’s verification success rate and resolve several previously unverifiable cases. To our knowledge, this is the first systematic empirical study investigating LLMs for loop invariant generation in formal verification. The results demonstrate the feasibility and practical utility of integrating LLMs as assistive tools in rigorous program verification workflows.
📝 Abstract
Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification. In this paper, we take some first steps towards answering this question. More specifically, we investigate whether ChatGPT can generate loop invariants. Loop invariant generation is a core task in software verification, and the generation of valid and useful invariants would likely help formal verifiers. To provide some first evidence on this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We check validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants allowing Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.
Problem

Research questions and friction points this paper is trying to address.

Investigating ChatGPT's ability to generate loop invariants
Evaluating validity and usefulness of AI-generated verification artifacts
Exploring integration of large language models with formal verifiers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ChatGPT to generate loop invariants
Combines ChatGPT with Frama-C and CPAchecker
Produces valid invariants enabling new program verifications
🔎 Similar Papers
No similar papers found.
C
Christian Janssen
Carl-von-Ossietzky Universität Oldenburg, Germany
C
Cedric Richter
Carl-von-Ossietzky Universität Oldenburg, Germany
Heike Wehrheim
Heike Wehrheim
University of Oldenburg
Formal methodssoftware verificationweak memory models