🤖 AI Summary
Existing AI systems struggle to generate software with formal correctness guarantees, particularly in distributed settings requiring exhaustive verification. This work proposes Inductive-Deductive Synthesis (IDS), a novel approach that enables large language model agents to collaboratively and incrementally co-generate code and formal proofs for the first time. The method incorporates an automated feedback-driven learning mechanism to iteratively refine synthesis strategies. Evaluated on seven distributed key-value store specifications, IDS successfully produced fully verified implementations in all cases, with an average runtime of 6.8 hours and cost of \$106—approximately 200 times faster than human experts—while achieving up to a 3× improvement in performance. This demonstrates the feasibility of efficiently automating the generation of formally verifiable distributed systems.
📝 Abstract
AI agents increasingly excel at generating, testing, and refining code. However, they fall short on tasks requiring formal guarantees of full coverage that testing alone cannot provide. Distributed systems are a prime example: properties such as consistency between reads and writes must hold under every possible interleaving of events. Mechanized formal verification can guarantee such correctness, but typically demands months to years of expert effort. As evidence, even SOTA coding agents (Codex with GPT-5.4 and Claude Code with Opus 4.6) succeed on only 2/7 distributed key-value-store specifications. In this paper, we present the first effective approach to addressing this gap, Inductive Deductive Synthesis (IDS), which jointly and incrementally synthesizes implementation and proof, and learns from failed attempts to systematically try promising strategies. Built as an agentic LLM system, IDS achieves 7/7 in about 6.8 hours and $106 per spec on average, roughly 200x faster than expert effort and 17% cheaper than SOTA agents. IDS further incorporates performance feedback into the same loop, yielding implementations up to 3x faster than published verified systems.