AutoReSpec: A Framework for Generating Specification using Large Language Models

📅 2026-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated generation of verifiable formal specifications is often hindered by syntactic errors, logical inaccuracies, inadequate handling of control-flow structures, and the absence of dynamic error-correction mechanisms. This work proposes AutoReSpec, a novel framework featuring a two-stage collaborative generation mechanism that synergistically combines open- and closed-source large language models. By dynamically selecting prompting strategies based on program structure and invoking a collaborative model upon primary model failure, AutoReSpec leverages feedback from a formal verifier to iteratively refine specifications. Through structure-aware scheduling and a verification-in-the-loop architecture, the approach significantly enhances robustness and efficiency, achieving a 58.2% success rate and 69.2% completeness across 72 Java benchmarks, while reducing average evaluation time by 26.89% compared to existing methods.
📝 Abstract
Formal specification generation has recently drawn attention in software engineering as a way to improve program correctness without requiring manual annotations. Large Language Models (LLMs) have shown promise in this area, but early results reveal several limitations. Generated specifications often fail verification due to syntax errors, logical inaccuracies, or incomplete reasoning, especially in programs with loops or branching logic. Techniques like SpecGen and FormalBench attempt to address this through prompting and benchmarking, but they typically rely on static prompts and do not offer mechanisms for recovering from failure or adapting to different program structures. In this paper, we present AutoReSpec, a collaborative framework that combines open and closed-source LLMs for verifiable specification generation. AutoReSpec dynamically chooses an LLM pair and prompt configuration based on the structure of the input program. If the primary LLM fails to produce a valid output, a collaborative model is invoked, using validator feedback to refine and correct the specification. This two-stage design enables both speed and robustness. We evaluate AutoReSpec on a new benchmark of 72 real-world and synthetic Java programs. Our results show that it achieves 67 passes out of 72, outperforming SpecGen and FormalBench in both Success Probability and Completeness. Our experimental evaluation achieves a 58.2% success probability and a 69.2% completeness score, while cutting evaluation time by 26.89% on average compared to prior methods. Together, these results demonstrate that AutoReSpec offers a scalable, efficient, and reliable approach to LLM-based formal specification generation.
Problem

Research questions and friction points this paper is trying to address.

formal specification generation
Large Language Models
program verification
syntax errors
logical inaccuracies
Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoReSpec
formal specification generation
large language models
collaborative LLM framework
validator feedback
🔎 Similar Papers
No similar papers found.