🤖 AI Summary
This work addresses the vulnerability of existing methods for inferring communication topologies in large language model (LLM) multi-agent systems, which rely on unrealistic assumptions and are easily thwarted by basic defense mechanisms, thereby failing to protect this high-value intellectual property. The paper proposes a context-driven inference framework that requires compromising only a single arbitrary agent—without needing control over the management node—and leverages contextual awareness to stealthily reconstruct the global communication topology. It innovatively integrates a covert jailbreaking mechanism with a fully non-jailbreaking diffusion-based inference strategy and introduces a topology masking technique with formal correctness guarantees. Experimental results demonstrate that, under active defenses, the proposed approach improves inference accuracy by approximately 60% over the current state-of-the-art while incurring negligible runtime overhead.
📝 Abstract
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied.
%
Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world threat of such attacks.
%
To bridge this realism gap, we propose \textit{WebWeaver}, an attack framework that infers the complete LLM-MAS topology by compromising only a single arbitrary agent instead of the administrative agent.
%
Unlike prior approaches, WebWeaver relies solely on agent contexts rather than agent IDs, enabling significantly stealthier inference.
%
WebWeaver further introduces a new covert jailbreak-based mechanism and a novel fully jailbreak-free diffusion design to handle cases where jailbreaks fail.
%
Additionally, we address a key challenge in diffusion-based inference by proposing a masking strategy that preserves known topology during diffusion, with theoretical guarantees of correctness.
%
Extensive experiments show that WebWeaver substantially outperforms state-of-the-art (SOTA) baselines, achieving about 60\% higher inference accuracy under active defenses with negligible overhead.