QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

In multi-agent systems, LLM-based agents face safety risks during tool invocation, multi-step planning, and cross-agent collaboration; natural-language safety policies are inherently ambiguous and context-dependent, hindering their translation into verifiable, executable machine-checkable rules. Method: We propose a sequent calculus–based formalization framework that automatically compiles safety policies into logical sequents embedded within runtime monitoring. We design a four-role collaborative guardian architecture—comprising state tracking, policy verification, threat monitoring, and adjudicatory arbitration—augmented by a hierarchical conflict-resolution mechanism that dynamically updates the top-k predicates. Contribution/Results: Evaluated on ST-WebAgentBench and AgentHarm benchmarks, our approach significantly improves guard accuracy and rule recall while reducing false positives, outperforming single-agent baselines such as ShieldAgent in comprehensive safety control efficacy.

Technology Category

Application Category

📝 Abstract

Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are ambiguous and context dependent, so they map poorly to machine-checkable rules, and runtime enforcement is unreliable. Expressing safety policies as sequents, we propose extsc{QuadSentinel}, a four-agent guard (state tracker, policy verifier, threat watcher, and referee) that compiles these policies into machine-checkable rules built from predicates over observable state and enforces them online. Referee logic plus an efficient top-$k$ predicate updater keeps costs low by prioritizing checks and resolving conflicts hierarchically. Measured on ST-WebAgentBench (ICML CUA~'25) and AgentHarm (ICLR~'25), extsc{QuadSentinel} improves guardrail accuracy and rule recall while reducing false positives. Against single-agent baselines such as ShieldAgent (ICML~'25), it yields better overall safety control. Near-term deployments can adopt this pattern without modifying core agents by keeping policies separate and machine-checkable. Our code will be made publicly available at https://github.com/yyiliu/QuadSentinel.

Problem

Research questions and friction points this paper is trying to address.

Ensuring safety in multi-agent systems with machine-checkable rules

Resolving ambiguity in natural language safety policies for reliable enforcement

Reducing false positives while improving guardrail accuracy and rule recall

Innovation

Methods, ideas, or system contributions that make the work stand out.

QuadSentinel uses four-agent guard for safety control

It compiles policies into machine-checkable rules online

Prioritizes checks hierarchically to reduce costs

🔎 Similar Papers

CommonPower: A Framework for Safe Data-Driven Smart Grid Control