Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM) agents exhibit unpredictable safety risks due to inherent stochasticity, yet existing rule-based runtime safeguards (e.g., AgentSpec) lack foresight and fail to handle long-horizon dependencies or distributional shifts. Method: We propose the first proactive runtime safety framework for LLM agents grounded in probabilistic reachability analysis—introducing probabilistic model checking to LLM agent safety. Our approach constructs discrete-time Markov chains via symbolic abstraction to enable long-horizon risk anticipation; it jointly incorporates PAC-bound estimation and semantic validity checking to support configurable safety–task trade-offs. Results: Experiments demonstrate that our framework blocks 93.6% of unsafe behaviors in-home service agents before execution; in autonomous driving scenarios, it achieves 100% prediction accuracy for traffic violations and collisions, with an average warning lead time of 38.66 seconds, while maintaining high task completion rates.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) agents exhibit powerful autonomous capabilities across domains such as robotics, virtual assistants, and web automation. However, their stochastic behavior introduces significant safety risks that are difficult to anticipate. Existing rule-based enforcement systems, such as AgentSpec, focus on developing reactive safety rules, which typically respond only when unsafe behavior is imminent or has already occurred. These systems lack foresight and struggle with long-horizon dependencies and distribution shifts. To address these limitations, we propose Pro2Guard, a proactive runtime enforcement framework grounded in probabilistic reachability analysis. Pro2Guard abstracts agent behaviors into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces. At runtime, it anticipates future risks by estimating the probability of reaching unsafe states, triggering interventions before violations occur when the predicted risk exceeds a user-defined threshold. By incorporating semantic validity checks and leveraging PAC bounds, Pro2Guard ensures statistical reliability while approximating the underlying ground-truth model. We evaluate Pro2Guard extensively across two safety-critical domains: embodied household agents and autonomous vehicles. In embodied agent tasks, Pro2Guard enforces safety early on up to 93.6% of unsafe tasks using low thresholds, while configurable modes (e.g., reflect) allow balancing safety with task success, maintaining up to 80.4% task completion. In autonomous driving scenarios, Pro2Guard achieves 100% prediction of traffic law violations and collisions, anticipating risks up to 38.66 seconds ahead.
Problem

Research questions and friction points this paper is trying to address.

Addresses safety risks in LLM agents' stochastic behavior
Overcomes limitations of reactive rule-based safety systems
Proactively predicts and prevents unsafe states via probabilistic analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive runtime enforcement via probabilistic model checking
Abstracts behaviors into symbolic states and learns DTMC
Anticipates future risks with semantic validity checks
H
Haoyu Wang
School of Computing and Information System, Singapore Management University, Singapore
Christopher M. Poskitt
Christopher M. Poskitt
Singapore Management University (SMU)
software engineeringsoftware testingformal methodscybersecuritygraph transformation
J
Jun Sun
School of Computing and Information System, Singapore Management University, Singapore
Jiali Wei
Jiali Wei
Xi'an Jiaotong University
AI TestingAI Security