🤖 AI Summary
In multi-agent systems, LLM-based agents face safety risks during tool invocation, multi-step planning, and cross-agent collaboration; natural-language safety policies are inherently ambiguous and context-dependent, hindering their translation into verifiable, executable machine-checkable rules. Method: We propose a sequent calculus–based formalization framework that automatically compiles safety policies into logical sequents embedded within runtime monitoring. We design a four-role collaborative guardian architecture—comprising state tracking, policy verification, threat monitoring, and adjudicatory arbitration—augmented by a hierarchical conflict-resolution mechanism that dynamically updates the top-k predicates. Contribution/Results: Evaluated on ST-WebAgentBench and AgentHarm benchmarks, our approach significantly improves guard accuracy and rule recall while reducing false positives, outperforming single-agent baselines such as ShieldAgent in comprehensive safety control efficacy.
📝 Abstract
Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are ambiguous and context dependent, so they map poorly to machine-checkable rules, and runtime enforcement is unreliable. Expressing safety policies as sequents, we propose extsc{QuadSentinel}, a four-agent guard (state tracker, policy verifier, threat watcher, and referee) that compiles these policies into machine-checkable rules built from predicates over observable state and enforces them online. Referee logic plus an efficient top-$k$ predicate updater keeps costs low by prioritizing checks and resolving conflicts hierarchically. Measured on ST-WebAgentBench (ICML CUA~'25) and AgentHarm (ICLR~'25), extsc{QuadSentinel} improves guardrail accuracy and rule recall while reducing false positives. Against single-agent baselines such as ShieldAgent (ICML~'25), it yields better overall safety control. Near-term deployments can adopt this pattern without modifying core agents by keeping policies separate and machine-checkable. Our code will be made publicly available at https://github.com/yyiliu/QuadSentinel.