SpecGen: Automated Generation of Formal Program Specifications via Large Language Models

📅 2024-01-16

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Formal program specifications are notoriously difficult, error-prone, and inefficient to write manually. To address this, we propose a two-stage LLM-driven approach: dialogue-guided specification synthesis followed by mutation-based verification. First, multi-turn dialogues model complex semantic requirements; second, four mutation operators—insertion, replacement, deletion, and reordering—enable verifiability-driven selection, eliminating reliance on rigid templates or syntactic grammars. Our method integrates code understanding, prompt engineering, and heuristic verifiability assessment. Evaluated on SV-COMP and a custom Java benchmark comprising 385 programs, it generates 279 verifiable specifications. These achieve significantly higher completeness and accuracy than pure-LLM baselines and classical tools (e.g., Houdini, Daikon). To our knowledge, this is the first approach to achieve both high coverage and formal verifiability in fully automated specification generation.

Technology Category

Application Category

📝 Abstract

Formal program specifications play a crucial role in various stages of software development. However, manually crafting formal program specifications is rather difficult, making the job time-consuming and labor-intensive. It is even more challenging to write specifications that correctly and comprehensively describe the semantics of complex programs. To reduce the burden on software developers, automated specification generation methods have emerged. However, existing methods usually rely on predefined templates or grammar, making them struggle to accurately describe the behavior and functionality of complex real-world programs. To tackle this challenge, we introduce SpecGen, a novel technique for formal program specification generation based on Large Language Models. Our key insight is to overcome the limitations of existing methods by leveraging the code comprehension capability of LLMs. The process of SpecGen consists of two phases. The first phase employs a conversational approach that guides the LLM to generate appropriate specifications for a given program. The second phase, designed for where the LLM fails to generate correct specifications, applies four mutation operators to the model-generated specifications and selects verifiable specifications from the mutated ones through a novel heuristic selection strategy. We evaluate SpecGen on two datasets, including the SV-COMP Java category benchmark and a manually constructed dataset. Experimental results demonstrate that SpecGen succeeds in generating verifiable specifications for 279 out of 385 programs, outperforming the existing purely LLM-based approaches and conventional specification generation tools like Houdini and Daikon. Further investigations on the quality of generated specifications indicate that SpecGen can comprehensively articulate the behaviors of the input program.

Problem

Research questions and friction points this paper is trying to address.

Automated generation of formal program specifications

Overcoming limitations of predefined templates

Leveraging LLMs for code comprehension

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Large Language Models

Uses conversational approach

Applies mutation operators

🔎 Similar Papers

An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications