QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent systems, LLM-based agents face safety risks during tool invocation, multi-step planning, and cross-agent collaboration; natural-language safety policies are inherently ambiguous and context-dependent, hindering their translation into verifiable, executable machine-checkable rules. Method: We propose a sequent calculus–based formalization framework that automatically compiles safety policies into logical sequents embedded within runtime monitoring. We design a four-role collaborative guardian architecture—comprising state tracking, policy verification, threat monitoring, and adjudicatory arbitration—augmented by a hierarchical conflict-resolution mechanism that dynamically updates the top-k predicates. Contribution/Results: Evaluated on ST-WebAgentBench and AgentHarm benchmarks, our approach significantly improves guard accuracy and rule recall while reducing false positives, outperforming single-agent baselines such as ShieldAgent in comprehensive safety control efficacy.

Technology Category

Application Category

📝 Abstract
Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are ambiguous and context dependent, so they map poorly to machine-checkable rules, and runtime enforcement is unreliable. Expressing safety policies as sequents, we propose extsc{QuadSentinel}, a four-agent guard (state tracker, policy verifier, threat watcher, and referee) that compiles these policies into machine-checkable rules built from predicates over observable state and enforces them online. Referee logic plus an efficient top-$k$ predicate updater keeps costs low by prioritizing checks and resolving conflicts hierarchically. Measured on ST-WebAgentBench (ICML CUA~'25) and AgentHarm (ICLR~'25), extsc{QuadSentinel} improves guardrail accuracy and rule recall while reducing false positives. Against single-agent baselines such as ShieldAgent (ICML~'25), it yields better overall safety control. Near-term deployments can adopt this pattern without modifying core agents by keeping policies separate and machine-checkable. Our code will be made publicly available at https://github.com/yyiliu/QuadSentinel.
Problem

Research questions and friction points this paper is trying to address.

Ensuring safety in multi-agent systems with machine-checkable rules
Resolving ambiguity in natural language safety policies for reliable enforcement
Reducing false positives while improving guardrail accuracy and rule recall
Innovation

Methods, ideas, or system contributions that make the work stand out.

QuadSentinel uses four-agent guard for safety control
It compiles policies into machine-checkable rules online
Prioritizes checks hierarchically to reduce costs
🔎 Similar Papers
No similar papers found.
Y
Yiliu Yang
The Chinese University of Hong Kong
Y
Yilei Jiang
The Chinese University of Hong Kong
Q
Qunzhong Wang
The Chinese University of Hong Kong
Y
Yingshui Tan
Alibaba Group
Xiaoyong Zhu
Xiaoyong Zhu
Jiangsu University
Electrical MachinesElectrical Vehicle
Sherman S. M. Chow
Sherman S. M. Chow
The Chinese University of Hong Kong
B
Bo Zheng
Alibaba Group
Xiangyu Yue
Xiangyu Yue
The Chinese University of Hong Kong / UC Berkeley / Stanford University / NJU
Artificial IntelligenceComputer VisionMulti-modal Learning