Conjunctive Prompt Attacks in Multi-Agent LLM Systems

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
Existing safety evaluation methods for single-agent systems fail to address novel vulnerabilities in multi-agent large language model (LLM) systems arising from prompt fragmentation and agent routing. This work reveals a new class of co-occurring prompt attacks, wherein innocuous trigger words in user queries combine with hidden adversarial templates embedded in compromised remote agents—activated only when routed together—to elicit harmful behaviors. To exploit this vulnerability, we introduce a routing-aware optimization technique that precisely injects attack payloads across star, chain, and DAG topologies while evading single-point detection mechanisms. Experiments demonstrate that this attack achieves high success rates with low false-trigger rates across diverse topologies, and bypasses state-of-the-art defenses such as PromptGuard and Llama-Guard, underscoring the urgent need for a new defense paradigm that jointly analyzes routing paths and cross-agent content composition.

Technology Category

Application Category

📝 Abstract
Most LLM safety work studies single-agent models, but many real applications rely on multiple interacting agents. In these systems, prompt segmentation and inter-agent routing create attack surfaces that single-agent evaluations miss. We study \emph{conjunctive prompt attacks}, where a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone but activate harmful behavior when routing brings them together. We consider an attacker who changes neither model weights nor the client agent and instead controls only trigger placement and template insertion. Across star, chain, and DAG topologies, routing-aware optimization substantially increases attack success over non-optimized baselines while keeping false activations low. Existing defenses, including PromptGuard, Llama-Guard variants, and system-level controls such as tool restrictions, do not reliably stop the attack because no single component appears malicious in isolation. These results expose a structural vulnerability in agentic LLM pipelines and motivate defenses that reason over routing and cross-agent composition. Code is available at https://github.com/UCF-ML-Research/ConjunctiveAgents.
Problem

Research questions and friction points this paper is trying to address.

conjunctive prompt attacks
multi-agent LLM systems
adversarial routing
structural vulnerability
cross-agent composition
Innovation

Methods, ideas, or system contributions that make the work stand out.

conjunctive prompt attacks
multi-agent LLM systems
routing-aware optimization
adversarial prompt composition
structural vulnerability