🤖 AI Summary
Existing formal verification methods operate at the function level, failing to address cross-module dependencies and the absence of global context in realistic multi-module projects. To bridge this gap, we propose the first repository-level automated verification paradigm: (1) we introduce RepoVBench, the first Verus-based repository-level benchmark; (2) we design a context-aware retrieval-augmented generation (RAG) framework that jointly models module dependency graphs, employs context-sensitive prompt engineering, and integrates the Verus verifier—balancing scalability and sample efficiency. Evaluated on RepoVBench, our approach achieves a 27% improvement over baselines; under constrained LLM budgets, it triples the proof success rate on existing benchmarks. Our core contribution is the paradigm shift from function-level to repository-level formal verification, enabling end-to-end proof synthesis across modules while preserving semantic coherence and correctness guarantees.
📝 Abstract
Scaling automated formal verification to real-world projects requires resolving cross-module dependencies and global contexts, which are challenges overlooked by existing function-centric methods. We introduce RagVerus, a framework that synergizes retrieval-augmented generation with context-aware prompting to automate proof synthesis for multi-module repositories, achieving a 27% relative improvement on our novel RepoVBench benchmark -- the first repository-level dataset for Verus with 383 proof completion tasks. RagVerus triples proof pass rates on existing benchmarks under constrained language model budgets, demonstrating a scalable and sample-efficient verification.