Automatic Generation of Formal Specification and Verification Annotations Using LLMs and Test Oracles

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This work addresses the limited adoption of formal verification, which often requires expert-written annotations such as preconditions, postconditions, and loop invariants. To overcome this barrier, the authors propose a novel approach that leverages large language models (LLMs) in conjunction with assertions from test cases as static oracles to automatically generate Dafny verification annotations from code annotated with natural language comments. The method features an iterative refinement process guided by verifier feedback over multiple rounds and uniquely integrates multi-model LLM collaboration with a closed-loop verifier feedback mechanism. A VS Code plugin was developed to support practical deployment. Evaluated on 110 Dafny programs, the approach achieves a 98.2% annotation correctness rate within at most eight repair iterations. Empirical results highlight that proof-assistant-style annotation remains a key challenge for LLMs, while user feedback on the plugin was notably positive.

Technology Category

Application Category

📝 Abstract

Recent verification tools aim to make formal verification more accessible to software engineers by automating most of the verification process. However, annotating conventional programs with the formal specification and verification constructs (preconditions, postconditions, loop invariants, auxiliary predicates and functions and proof helpers) required to prove their correctness still demands significant manual effort and expertise. This paper investigates how LLMs can automatically generate such annotations for programs written in Dafny, a verification-aware programming language, starting from conventional code accompanied by natural language specifications (in comments) and test code. In experiments on 110 Dafny programs, a multimodel approach combining Claude Opus 4.5 and GPT-5.2 generated correct annotations for 98.2% of the programs within at most 8 repair iterations, using verifier feedback. A logistic regression analysis shows that proof-helper annotations contribute disproportionately to problem difficulty for current LLMs. Assertions in the test cases served as static oracles to automatically validate the generated pre/postconditions. We also compare generated and manual solutions and present an extension for Visual Studio Code to incorporate automatic generation into the IDE, with encouraging usability feedback.

Problem

Research questions and friction points this paper is trying to address.

formal specification

verification annotations

LLMs

Dafny

test oracles

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based verification

automatic annotation generation

test oracles