VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM) agents deployed in safety-critical domains—such as healthcare—face severe security risks, including objective misalignment, privacy leakage, and adversarial attacks; existing mitigation approaches lack provably correct behavioral compliance guarantees. Method: We propose a two-phase formal security assurance framework: (1) an offline phase that performs intent parsing and behavior policy synthesis, followed by software testing and formal verification to select only policies satisfying rigorous safety constraints; and (2) an online phase deploying a lightweight runtime monitor that enforces real-time, stepwise compliance checking. Contribution/Results: This work is the first to tightly integrate end-to-end formal verification with dynamic runtime monitoring, enabling provably secure agent behavior. Experiments demonstrate that the framework maintains inference efficiency while blocking over 99.2% of out-of-bounds actions, significantly enhancing the trustworthiness and robustness of LLM agents in high-stakes scenarios.

Technology Category

Application Category

📝 Abstract
The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These agents may deviate from user objectives, violate data handling policies, or be compromised by adversarial attacks. Mitigating these dangers necessitates a mechanism to formally guarantee that an agent's actions adhere to predefined safety constraints, a challenge that existing systems do not fully address. We introduce VeriGuard, a novel framework that provides formal safety guarantees for LLM-based agents through a dual-stage architecture designed for robust and verifiable correctness. The initial offline stage involves a comprehensive validation process. It begins by clarifying user intent to establish precise safety specifications. VeriGuard then synthesizes a behavioral policy and subjects it to both testing and formal verification to prove its compliance with these specifications. This iterative process refines the policy until it is deemed correct. Subsequently, the second stage provides online action monitoring, where VeriGuard operates as a runtime monitor to validate each proposed agent action against the pre-verified policy before execution. This separation of the exhaustive offline validation from the lightweight online monitoring allows formal guarantees to be practically applied, providing a robust safeguard that substantially improves the trustworthiness of LLM agents.
Problem

Research questions and friction points this paper is trying to address.

Ensuring LLM agent actions comply with predefined safety constraints
Preventing policy violations and adversarial attacks in sensitive domains
Providing formal safety guarantees through verifiable code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verified code generation for LLM agent safety
Dual-stage architecture with offline policy validation
Online action monitoring against pre-verified policies