Certified Program Synthesis with a Multi-Modal Verifier

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the challenge that program specifications generated from natural language are often too weak or overly restrictive for effective verification, and existing approaches are constrained by a single verification paradigm, struggling to balance automation with expressiveness. The paper proposes Velvet, a multimodal verifier architecture that unifies dynamic testing, automated reasoning, and interactive proving within a certified program synthesis pipeline, enabling specification validation, task decomposition, and proof delegation. Built upon Lean, Velvet integrates random property-based testing, verification-condition-guided divide-and-conquer synthesis, and state-of-the-art AI-powered theorem provers. Experiments demonstrate that the approach effectively uncovers flaws in existing specifications on standard benchmarks, substantially increases the rate of fully verified solutions, and maintains consistent performance across different large language model backends.

Technology Category

Application Category

📝 Abstract

Certified program synthesis (aka vericoding) is the process of automatically generating a program, its formal specification, and a machine-checkable proof of their alignment from a natural-language description. Two challenges make vericoding difficult. First, specifications synthesised from natural language are often either too weak to be meaningful or too strong to be implementable, yet existing approaches lack systematic means to detect such defects. Second, the landscape of program verifiers is fragmented: each tool supports a particular reasoning mode -- auto-active (e.g., Dafny, Verus) or interactive (e.g., Coq, Lean) -- with its own trade-off between automation and expressivity. This forces every synthesis methodology to be tailored to a single verification paradigm, limiting the class of tasks it can handle effectively. We overcome both challenges by structuring the certified synthesis workflow around a multi-modal verifier -- a single tool combining dynamic validation, automated proofs, and interactive proof scripting in one foundational framework. We realise this idea in LeetProof, an agentic pipeline built on Velvet, a multi-modal verifier embedded in Lean. Multi-modality enables LeetProof to validate generated specifications via randomised property-based testing before any code is synthesised, decompose the synthesis task into sub-problems guided by verification conditions, and delegate residual proof obligations to frontier AI provers specialised for Lean. We evaluate LeetProof on benchmarks derived from prior work on certified synthesis. Our specification validation uncovers defects in existing reference benchmarks, and LeetProof's staged pipeline achieves a significantly higher rate of fully certified solutions than a single-mode baseline at the same budget -- consistently across two frontier LLM backends.

Problem

Research questions and friction points this paper is trying to address.

certified program synthesis

specification defects

multi-modal verification

verifier fragmentation

natural-language to formal specification

Innovation

Methods, ideas, or system contributions that make the work stand out.

certified program synthesis

multi-modal verifier

specification validation