🤖 AI Summary
This work addresses a critical vulnerability in tool-augmented large language model (LLM) agents that rely on third-party tool metadata, exposing them to supply-chain attacks that induce a structural “overthinking” loop. Unlike conventional token-redundancy attacks, this threat—formally defined for the first time in this paper—enables malicious MCP tools to trigger excessive, seemingly legitimate invocation sequences, leading to up to a 142.4× increase in end-to-end token consumption and significant degradation in task performance. Through the construction of 14 adversarial tools across three server types employing repetition, forced refinement, and interference strategies, the study demonstrates the attack’s prevalence across diverse LLMs and tool ecosystems. Crucially, existing decoding-stage simplicity constraints prove ineffective against this class of exploits, highlighting a pressing need for new defensive mechanisms.
📝 Abstract
Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages. We show that this convenience creates a supply-chain attack surface: a malicious MCP tool server can be co-registered alongside normal tools and induce overthinking loops, where individually trivial or plausible tool calls compose into cyclic trajectories that inflate end-to-end tokens and latency without any single step looking abnormal. We formalize this as a structural overthinking attack, distinguishable from token-level verbosity, and implement 14 malicious tools across three servers that trigger repetition, forced refinement, and distraction. Across heterogeneous registries and multiple tool-capable models, the attack causes severe resource amplification (up to $142.4\times$ tokens) and can degrade task outcomes. Finally, we find that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone.