$ extit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work identifies a novel adversarial threat to multi-agent large language model (LLM) systems operating under distributed inference and bandwidth-constrained communication. It exposes the vulnerability of existing safety mechanisms to cross-agent prompt interactions. To address this, we propose the first permutation-invariant prompt attack framework, which formulates the attack path as a maximum-flow minimum-cost problem and introduces the Permutation-Invariant Entropy Loss (PIEL) to jointly perturb inputs across multiple agents. Our method integrates graph optimization, network flow theory, and multi-model adversarial prompting techniques, applied to Llama, Mistral, Gemma, and DeepSeek. Evaluated on JailBreakBench and AdversarialBench, our approach achieves up to a 7× improvement in attack success rate over state-of-the-art baselines and effectively evades prominent defenses including Llama-Guard and PromptGuard. This work establishes a new paradigm for security assessment of multi-agent LLM systems.

Technology Category

Application Category

📝 Abstract

Most discussions about Large Language Model (LLM) safety have focused on single-agent settings but multi-agent LLM systems now create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning. In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms. We design a $ extit{permutation-invariant adversarial attack}$ that optimizes prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms within the system. Formulating the attack path as a problem of $ extit{maximum-flow minimum-cost}$, coupled with the novel $ extit{Permutation-Invariant Evasion Loss (PIEL)}$, we leverage graph-based optimization to maximize attack success rate while minimizing detection risk. Evaluating across models including $ exttt{Llama}$, $ exttt{Mistral}$, $ exttt{Gemma}$, $ exttt{DeepSeek}$ and other variants on various datasets like $ exttt{JailBreakBench}$ and $ exttt{AdversarialBench}$, our method outperforms conventional attacks by up to $7 imes$, exposing critical vulnerabilities in multi-agent systems. Moreover, we demonstrate that existing defenses, including variants of $ exttt{Llama-Guard}$ and $ exttt{PromptGuard}$, fail to prohibit our attack, emphasizing the urgent need for multi-agent specific safety mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Attacking multi-agent LLM systems with optimized prompt distribution

Bypassing safety mechanisms via permutation-invariant adversarial attacks

Exposing vulnerabilities in decentralized reasoning and communication

Innovation

Methods, ideas, or system contributions that make the work stand out.

Permutation-invariant adversarial attack optimizes prompts

Graph-based optimization maximizes attack success rate

Novel PIEL loss minimizes detection risk

🔎 Similar Papers

No similar papers found.