Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses the contagious jailbreak problem in multi-agent systems triggered by a single compromised agent. To mitigate this vulnerability, the authors propose a training-free, forward-looking local purification (FLP) framework that eschews global shared curing mechanisms in favor of localized interaction-based evolution. By simulating multi-role behaviors and employing recursive binary diagnosis (RBD), FLP proactively infers future interaction trajectories to precisely identify and eliminate infection sources. Integrated with response diversity diagnostics and immediate snapshot rollback, the framework effectively suppresses propagation while preserving agent response diversity, reducing the peak cumulative infection rate from over 95% to below 5.47% and restoring critical retrieval and semantic metrics to benign baseline levels.
📝 Abstract
Large multimodal model-based Multi-Agent Systems (MASs) enable collaborative complex problem solving through specialized agents. However, MASs are vulnerable to infectious jailbreak, where compromising a single agent can spread to others, leading to widespread compromise. Existing defenses counter this by training a more contagious cure factor, biasing agents to retrieve it over virus adversarial examples (VirAEs). However, this homogenizes agent responses, providing only superficial suppression rather than true recovery. We revisit these defenses, which operate globally via a shared cure factor, while infectious jailbreak arise from localized interaction behaviors. This mismatch limits their effectiveness. To address this, we propose a training-free Foresight-Guided Local Purification (FLP) framework, where each agent reasons over future interactions to track behavioral evolution and eliminate infections. Specifically, each agent simulates future behavioral trajectories over subsequent chat rounds. To reflect diversity in MASs, we introduce a multi-persona simulation strategy for robust prediction across interaction contexts. We then use response diversity as a diagnostic signal to detect infection by analyzing inconsistencies across persona-based predictions at both retrieval-result and semantic levels. For infected agents, we apply localized purification: recent infections are mitigated via immediate album rollback, while long-term infections are handled using Recursive Binary Diagnosis (RBD), which recursively partitions the image album and applies the same diagnosis strategy to localize and eliminate VirAEs. Experiments show that FLP reduces the maximum cumulative infection rate from over 95% to below 5.47%. Moreover, retrieval and semantic metrics closely match benign baselines, indicating effective preservation of interaction diversity.
Problem

Research questions and friction points this paper is trying to address.

infectious jailbreak
multi-agent systems
adversarial examples
behavioral evolution
security vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Foresight-Guided Defense
Local Purification
Multi-Agent Systems
Infectious Jailbreak
Recursive Binary Diagnosis