BEAVER: An Efficient Deterministic LLM Verifier

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language models (LLMs) lack reliable mechanisms to verify constraint satisfaction in their outputs, rendering probabilistic guarantees unstable and sampling-based estimates insufficient. Method: We propose the first practical, deterministic verification framework for LLM output constraints, yielding theoretically grounded, tight upper bounds on violation probabilities—replacing unreliable sampling. Our approach introduces prefix-closed semantic modeling, systematic search over the generation space, and novel data structures—token tries and frontier sets—that dynamically maintain provable reputation bounds at each decoding step. Results: Evaluated on multiple state-of-the-art LLMs, our framework achieves 6–8× tighter probability bounds than prior methods and detects high-risk instances 3–4× more effectively than baselines. It significantly enhances precise risk assessment for correctness, privacy, and safety—enabling rigorous, computationally efficient certification of constrained LLM behavior.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.

Problem

Research questions and friction points this paper is trying to address.

Computes deterministic probability bounds for LLM constraint satisfaction

Verifies correctness, privacy, and secure code generation across LLMs

Provides tighter bounds and identifies more high-risk instances than baselines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic probability bounds for LLM constraints

Token trie and frontier data structures for exploration

Sound verification across correctness, privacy, and security

🔎 Similar Papers

CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks