🤖 AI Summary
This work addresses the challenge of mounting stealthy yet effective attacks in monitored multi-agent systems, where existing methods are often detected and blocked by anomaly detection mechanisms due to their lack of subtlety. For the first time, this study systematically investigates the feasibility of covert attacks under surveillance and proposes a novel attack framework that explicitly models the anomaly detection mechanism, analyzes agent behaviors, and incorporates a strategic agent selection algorithm to evade monitoring while achieving successful infiltration. Experimental results demonstrate that the proposed approach significantly outperforms current attack strategies across diverse monitoring configurations. These findings reveal that reliance on monitoring alone is insufficient to ensure system security and underscore the critical need for adversarial-aware defensive mechanisms.
📝 Abstract
Multi-agent discussions have been widely adopted, motivating growing efforts to develop attacks that expose their vulnerabilities. In this work, we study a practical yet largely unexplored attack scenario, the discussion-monitored scenario, where anomaly detectors continuously monitor inter-agent communications and block detected adversarial messages. Although existing attacks are effective without discussion monitoring, we show that they exhibit detectable patterns and largely fail under such monitoring constraints. But does this imply that monitoring alone is sufficient to secure multi-agent discussions? To answer this question, we develop a novel attack method explicitly tailored to the discussion-monitored scenario. Extensive experiments demonstrate that effective attacks remain possible even under continuous monitoring, indicating that monitoring alone does not eliminate adversarial risks.