Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Real-world software often lacks formal specifications, hindering the practical deployment of automated verifiers. To address this, we introduce NL2Contract—a novel task that systematically investigates leveraging large language models (LLMs) to jointly generate complete functional contracts, comprising both preconditions and postconditions, from natural-language cues in code (e.g., function names and comments). Unlike prior work focusing solely on postcondition generation, our approach mitigates verifier false positives caused by incomplete specifications. We propose a multi-dimensional evaluation framework assessing correctness, defect-detection capability, and practical utility. Through an integrated LLM-generation–verification pipeline, we empirically demonstrate that our generated contracts maintain high correctness while substantially reducing false alarms compared to postcondition-only baselines. This work establishes a new paradigm for LLM-augmented formal verification and provides rigorous empirical evidence supporting its viability.

Technology Category

Application Category

📝 Abstract

Automatic software verifiers have become increasingly effective at the task of checking software against (formal) specifications. Yet, their adoption in practice has been hampered by the lack of such specifications in real world code. Large Language Models (LLMs) have shown promise in inferring formal postconditions from natural language hints embedded in code such as function names, comments or documentation. Using the generated postconditions as specifications in a subsequent verification, however, often leads verifiers to suggest invalid inputs, hinting at potential issues that ultimately turn out to be false alarms. To address this, we revisit the problem of specification inference from natural language in the context of automatic software verification. In the process, we introduce NL2Contract, the task of employing LLMs to translate informal natural language into formal functional contracts, consisting of postconditions as well as preconditions. We introduce metrics to validate and compare different NL2Contract approaches, using soundness, bug discriminative power of the generated contracts and their usability in the context of automatic software verification as key metrics. We evaluate NL2Contract with different LLMs and compare it to the task of postcondition generation nl2postcond. Our evaluation shows that (1) LLMs are generally effective at generating functional contracts sound for all possible inputs, (2) the generated contracts are sufficiently expressive for discriminating buggy from correct behavior, and (3) verifiers supplied with LLM inferred functional contracts produce fewer false alarms than when provided with postconditions alone. Further investigations show that LLM inferred preconditions generally align well with developers intentions which allows us to use automatic software verifiers to catch real-world bugs.

Problem

Research questions and friction points this paper is trying to address.

LLMs generate formal contracts from natural language for software verification

Addresses false alarms in verification by inferring preconditions and postconditions

Evaluates contract soundness and bug detection capability in real code

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate complete functional contracts from natural language

Generated contracts include both preconditions and postconditions

Contracts reduce false alarms and improve bug detection

🔎 Similar Papers

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models