TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical gap in unified security frameworks for large language model (LLM)-based multi-agent systems, which currently lack comprehensive protection and evaluation mechanisms against threats spanning individual agents, inter-agent interactions, and system-wide levels. The paper introduces the first tripartite security framework tailored for multi-agent systems, establishing a three-tiered, fine-grained risk taxonomy aligned with OWASP standards that encompasses 20 distinct risk categories. By integrating an MAS abstraction layer, customizable testing modules, a unified LLM-based adjudication factory, structured trajectory analysis, and automated attack probe generation, the framework enables end-to-end security coverage—from pre-deployment assessment to runtime monitoring. Demonstrated across multiple representative platforms, the approach supports cross-platform adaptability, generates detailed vulnerability reports, triggers real-time alerts, and significantly enhances overall system security and reliability.

Technology Category

Application Category

📝 Abstract
With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despite attempts to address these issues, the existing literature lacks a cohesive safeguarding system specialized for MAS risks. In this work, we introduce TrinityGuard, a comprehensive safety evaluation and monitoring framework for LLM-based MAS, grounded in the OWASP standards. Specifically, TrinityGuard encompasses a three-tier fine-grained risk taxonomy that identifies 20 risk types, covering single-agent vulnerabilities, inter-agent communication threats, and system-level emergent hazards. Designed for scalability across various MAS structures and platforms, TrinityGuard is organized in a trinity manner, involving an MAS abstraction layer that can be adapted to any MAS structures, an evaluation layer containing risk-specific test modules, alongside runtime monitor agents coordinated by a unified LLM Judge Factory. During Evaluation, TrinityGuard executes curated attack probes to generate detailed vulnerability reports for each risk type, where monitor agents analyze structured execution traces and issue real-time alerts, enabling both pre-development evaluation and runtime monitoring. We further formalize these safety metrics and present detailed case studies across various representative MAS examples, showcasing the versatility and reliability of TrinityGuard. Overall, TrinityGuard acts as a comprehensive framework for evaluating and monitoring various risks in MAS, paving the way for further research into their safety and security.
Problem

Research questions and friction points this paper is trying to address.

multi-agent systems
safety
security
LLM-based MAS
risk taxonomy
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent systems
safety evaluation
risk taxonomy
runtime monitoring
LLM-based security
K
Kai Wang
Shanghai AI Laboratory
B
Biaojie Zeng
Shanghai AI Laboratory
Zeming Wei
Zeming Wei
Ph.D. Candidate, Peking University
Trustworthy AIAdversarial RobustnessExplainability
C
Chang Jin
Shanghai AI Laboratory
Hefeng Zhou
Hefeng Zhou
上海交通大学
AIEA
X
Xiangtian Li
Shanghai AI Laboratory
Chao Yang
Chao Yang
Research Scientist in Shanghai AI Laboratory
LLM SafetyMulti-modalRoboticsReinforcement Learning
J
Jingjing Qu
Shanghai AI Laboratory
X
Xingcheng Xu
Shanghai AI Laboratory
Xia Hu
Xia Hu
Google DeepMind
Deep LearningMachine LearningMultimodal