🤖 AI Summary
This work addresses the vulnerability of large language model (LLM)-based multi-agent systems to cross-channel, distributed adversarial attacks that can trigger stealthy cascading failures. The authors propose the first unified cross-channel causal modeling framework, which leverages Late-Interaction Conditional Transfer Entropy (LI-CTE) to efficiently construct dynamic causal influence matrices. This enables online monitoring, early warning, and causal attribution of cascading attacks, accurately identifying origin, bridge, and amplifier nodes while reconstructing propagation pathways. Evaluated across multiple multi-agent benchmarks, the method significantly outperforms existing approaches—including semantic guardrails, LLM-based adjudicators, and graph anomaly detection—in both detection accuracy and early identification capability, with less than 1% latency overhead.
📝 Abstract
Cascade attacks in LLM multi-agent systems (MAS) arise when adversarial influence propagates across agents and leads to escalated system-level failures through complex agent interactions. Detecting such cascades is challenging, as their signals are distributed, tightly coupled across interaction channels, and often appear plausibly benign locally but may unfold quickly either within a single turn or gradually across multiple turns. Existing defenses, being largely local and text-centric, fail to capture such cross-channel, temporally coordinated dynamics of cascade propagation. Therefore, we propose CASPIAN, the first framework that provides a unified, cross-channel causal analysis of cascade behavior in LLM-MAS through online monitoring of dynamic influence propagation across agents. CASPIAN models multi-agent interactions using a unified, dynamic causal influence matrix across channels, estimated efficiently via a late-interaction conditional transfer entropy (LI-CTE) formulation, thereby enabling the detection of cascade onset from emergent system-level structure rather than isolated anomalies. It further performs online causal attribution, identifying the origin, bridge, and amplifier agents driving the cascade and reconstructing its principal propagation pathways, capabilities not supported by existing methods. Across diverse multi-agent frameworks and benchmarks, CASPIAN consistently outperforms semantic guardrails, LLM-based judges, and graph-based anomaly detectors in both detection accuracy and early cascade identification while operating with sub-1% relative overhead latency. These results demonstrate that unified cross-channel causal modeling is essential for reliably detecting and understanding cascade failures in LLM multi-agent systems.