Conjunctive Prompt Attacks in Multi-Agent LLM Systems

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing safety evaluation methods for single-agent systems fail to address novel vulnerabilities in multi-agent large language model (LLM) systems arising from prompt fragmentation and agent routing. This work reveals a new class of co-occurring prompt attacks, wherein innocuous trigger words in user queries combine with hidden adversarial templates embedded in compromised remote agents—activated only when routed together—to elicit harmful behaviors. To exploit this vulnerability, we introduce a routing-aware optimization technique that precisely injects attack payloads across star, chain, and DAG topologies while evading single-point detection mechanisms. Experiments demonstrate that this attack achieves high success rates with low false-trigger rates across diverse topologies, and bypasses state-of-the-art defenses such as PromptGuard and Llama-Guard, underscoring the urgent need for a new defense paradigm that jointly analyzes routing paths and cross-agent content composition.

Technology Category

Application Category

📝 Abstract

Most LLM safety work studies single-agent models, but many real applications rely on multiple interacting agents. In these systems, prompt segmentation and inter-agent routing create attack surfaces that single-agent evaluations miss. We study \emph{conjunctive prompt attacks}, where a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone but activate harmful behavior when routing brings them together. We consider an attacker who changes neither model weights nor the client agent and instead controls only trigger placement and template insertion. Across star, chain, and DAG topologies, routing-aware optimization substantially increases attack success over non-optimized baselines while keeping false activations low. Existing defenses, including PromptGuard, Llama-Guard variants, and system-level controls such as tool restrictions, do not reliably stop the attack because no single component appears malicious in isolation. These results expose a structural vulnerability in agentic LLM pipelines and motivate defenses that reason over routing and cross-agent composition. Code is available at https://github.com/UCF-ML-Research/ConjunctiveAgents.

Problem

Research questions and friction points this paper is trying to address.

conjunctive prompt attacks

multi-agent LLM systems

adversarial routing

structural vulnerability

cross-agent composition

Innovation

Methods, ideas, or system contributions that make the work stand out.

conjunctive prompt attacks

multi-agent LLM systems

routing-aware optimization