DafnyPro: LLM-Assisted Automated Verification for Dafny Programs

๐Ÿ“… 2026-01-08
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes DafnyPro, a framework designed to enhance the automation of formal verification by automatically generating verification annotations for Dafny programs. Without modifying the original code, DafnyPro integrates differential checking, invariant pruning, and prompt augmentationโ€”a novel combination that substantially boosts the reasoning capabilities of large language models (LLMs). Experimental results on the DafnyBench benchmark demonstrate that Claude Sonnet 3.5, when augmented with DafnyPro, achieves an 86% correct proof rate, a 16-percentage-point improvement over the baseline. Furthermore, supervised fine-tuned smaller models, Qwen-7B and Qwen-14B, attain 68% and 70% correctness rates, respectively, confirming both the effectiveness and transferability of the proposed approach.

Technology Category

Application Category

๐Ÿ“ Abstract
We present DafnyPro, an inference-time framework that enhances LLMs for generating verification annotations in Dafny. DafnyPro comprises three key components: a diff-checker that prevents modifications to base program logic, a pruner that removes unnecessary invariants, and a hint-augmentation system that retrieves and applies predefined, problem-independent proof strategies. We evaluate DafnyPro using Claude Sonnet 3.5 and 3.7 on four benchmarks: Clover, MBPP-Dafny, HumanEval-Dafny, and DafnyBench, achieving consistent performance gains in all cases. Notably, on DafnyBench, the most challenging benchmark, Claude Sonnet 3.5 enhanced with DafnyPro achieves 86% correct proofs, a 16 pp improvement over the base model. We also fine-tune two Qwen models on training data derived from verification attempts by larger models enhanced with DafnyPro. Our 7B and 14B models achieve 68% and 70% correct proofs on DafnyBench, respectively, demonstrating that smaller models can maintain high verification accuracy.
Problem

Research questions and friction points this paper is trying to address.

Automated Verification
Dafny
LLM
Program Verification
Verification Annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted verification
Dafny
invariant pruning
proof hint augmentation
automated program verification
๐Ÿ”Ž Similar Papers
No similar papers found.