π€ AI Summary
Current large language models treat proof generation in formal verification as a static end-to-end prediction task, which precludes the use of program execution feedback and thereby limits their proving capabilities. This work proposes the first large language model framework that integrates counterexample-guided reasoning: upon verification failure, the system automatically generates and validates concrete counterexamples, then leverages them to guide the model in generalizing inductive invariants for proof repair. By introducing dynamic, behavior-aware counterexample reasoning into large language modelβdriven formal verification, this approach significantly enhances the accuracy, robustness, and token efficiency of proof generation in Verus, outperforming state-of-the-art prompting strategies.
π Abstract
Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.