🤖 AI Summary
LLM-driven multi-agent systems (MAS) face critical security and privacy risks—including goal misalignment, malicious agent infiltration, communication hijacking, and data poisoning—particularly in automated collaborative tasks such as meeting scheduling.
Method: This paper introduces the first modular, blackboard-based security testing framework for MAS, enabling fine-grained risk modeling, configurable collaboration protocols, and simulation of four distinct attack classes, integrated with dynamic access control and information-sharing mechanisms.
Contribution/Results: Evaluated across three real-world collaborative scenarios, the framework significantly accelerates the prototyping, evaluation, and iterative refinement of defensive strategies. Its core innovation lies in re-purposing the classical blackboard architecture as a trusted experimental foundation for secure MAS, establishing a reproducible, scalable benchmark platform to advance robustness and controllability research for LLM-based MAS.
📝 Abstract
A multi-agent system (MAS) powered by large language models (LLMs) can automate tedious user tasks such as meeting scheduling that requires inter-agent collaboration. LLMs enable nuanced protocols that account for unstructured private data, user constraints, and preferences. However, this design introduces new risks, including misalignment and attacks by malicious parties that compromise agents or steal user data. In this paper, we propose the Terrarium framework for fine-grained study on safety, privacy, and security in LLM-based MAS. We repurpose the blackboard design, an early approach in multi-agent systems, to create a modular, configurable testbed for multi-agent collaboration. We identify key attack vectors such as misalignment, malicious agents, compromised communication, and data poisoning. We implement three collaborative MAS scenarios with four representative attacks to demonstrate the framework's flexibility. By providing tools to rapidly prototype, evaluate, and iterate on defenses and designs, Terrarium aims to accelerate progress toward trustworthy multi-agent systems.