Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

LLM-driven multi-agent systems (MAS) face critical security and privacy risks—including goal misalignment, malicious agent infiltration, communication hijacking, and data poisoning—particularly in automated collaborative tasks such as meeting scheduling. Method: This paper introduces the first modular, blackboard-based security testing framework for MAS, enabling fine-grained risk modeling, configurable collaboration protocols, and simulation of four distinct attack classes, integrated with dynamic access control and information-sharing mechanisms. Contribution/Results: Evaluated across three real-world collaborative scenarios, the framework significantly accelerates the prototyping, evaluation, and iterative refinement of defensive strategies. Its core innovation lies in re-purposing the classical blackboard architecture as a trusted experimental foundation for secure MAS, establishing a reproducible, scalable benchmark platform to advance robustness and controllability research for LLM-based MAS.

Technology Category

Application Category

📝 Abstract

A multi-agent system (MAS) powered by large language models (LLMs) can automate tedious user tasks such as meeting scheduling that requires inter-agent collaboration. LLMs enable nuanced protocols that account for unstructured private data, user constraints, and preferences. However, this design introduces new risks, including misalignment and attacks by malicious parties that compromise agents or steal user data. In this paper, we propose the Terrarium framework for fine-grained study on safety, privacy, and security in LLM-based MAS. We repurpose the blackboard design, an early approach in multi-agent systems, to create a modular, configurable testbed for multi-agent collaboration. We identify key attack vectors such as misalignment, malicious agents, compromised communication, and data poisoning. We implement three collaborative MAS scenarios with four representative attacks to demonstrate the framework's flexibility. By providing tools to rapidly prototype, evaluate, and iterate on defenses and designs, Terrarium aims to accelerate progress toward trustworthy multi-agent systems.

Problem

Research questions and friction points this paper is trying to address.

Studying safety risks in multi-agent systems using LLMs

Addressing privacy threats from malicious agents and attacks

Developing security frameworks against data poisoning and misalignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Repurposed blackboard design for modular testbed

Implemented configurable multi-agent collaboration scenarios

Provided tools for rapid defense prototyping

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

2024-07-31arXiv.orgCitations: 5

💼 Related Jobs

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Research Engineer - AI Trust - Meta Superintelligence Labs