BRIDGE: Building Representations In Domain Guided Program Verification

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Large language models (LLMs) excel at code generation but face scalability bottlenecks in interactive proof frameworks (e.g., Lean4) for program verification—particularly in jointly generating high-quality code, formal specifications, and machine-checkable proofs. This work introduces the first systematic, structured prompting paradigm for scalable verified program generation, explicitly decoupling and semantically aligning three core elements: functional implementation, specification, and formal proof. It constructs three intermediate reasoning representations—function-driven, specification-driven, and proof-oriented—to enable coordinated generation. A domain-specific prompting strategy implements this tripartite co-generation in both Lean4 and Python. Experiments demonstrate significant improvements: in Lean4, code correctness (pass@5) increases by 1.47× and inference efficiency doubles; in Python, task pass rates improve by up to 17.5%, substantially reducing sampling overhead.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved impressive results in code generation, yet struggle with program verification, especially in interactive proof frameworks such as Lean4. A central challenge is scalability: verified synthesis requires not just code, but also precise specifications and correctness proofs, and existing approaches rarely span all three domains. We present BRIDGE, the first systematic study of structured prompting for scalable verified program generation. BRIDGE decomposes verification into three interconnected domains: Code (executable implementations), Specifications (formal intent statements), and Proofs (constructive correctness arguments). Our key idea is to elicit distinct reasoning behaviors functional, specification-driven, and proof-oriented as intermediate representations that preserve semantic structure and connect these domains. Through systematic ablations, we show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods. For example, functional reasoning improves correctness of code in formal languages (Lean4) by nearly 1.5x (pass@5) over direct baselines. In inference-time compute, functional reasoning is also 2x more efficient, achieving higher pass rates with fewer generations and lower total sampling budgets. Similarly, we find that specification-driven prompting boosts Python coding pass rates by up to 17.5%. These findings suggest that structured domain alignment is a promising direction for advancing verified synthesis. BRIDGE establishes a foundation for training via expert iteration or RLVR, enabling models to internalize these reasoning strategies across code, specifications, and proofs.

Problem

Research questions and friction points this paper is trying to address.

Improving program verification scalability in interactive proof frameworks like Lean4

Generating verified code with precise specifications and correctness proofs simultaneously

Addressing LLM limitations in connecting code, specifications, and proof domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes verification into Code, Specifications, Proofs

Uses domain-guided prompting for distinct reasoning behaviors

Improves accuracy and efficiency in verified program generation

🔎 Similar Papers

No similar papers found.