🤖 AI Summary
Systematic security evaluation of Model Context Protocols (MCP) remains severely underexplored, with existing studies largely confined to qualitative or narrow-scope analyses that fail to capture the diversity of real-world threats. Method: This work introduces the first comprehensive MCP attack taxonomy covering 31 distinct attacks, proposes MCPLIB—a unified attack framework—and conducts empirical evaluation via quantitative experiments and simulations across four vulnerability pathways: direct/indirect tool injection, malicious user interaction, and inherent LLM deficiencies. Contribution/Results: We identify critical vulnerabilities—including agents’ blind trust in tool descriptions, sensitivity to file parsing, feasibility of multi-step chained attacks, and context pollution risks. Our findings establish the first empirically grounded security benchmark for MCP, provide reproducible attack patterns, and offer concrete directions for robust protocol design and defense mechanisms.
📝 Abstract
The Model Context Protocol (MCP) has emerged as a universal standard that enables AI agents to seamlessly connect with external tools, significantly enhancing their functionality. However, while MCP brings notable benefits, it also introduces significant vulnerabilities, such as Tool Poisoning Attacks (TPA), where hidden malicious instructions exploit the sycophancy of large language models (LLMs) to manipulate agent behavior. Despite these risks, current academic research on MCP security remains limited, with most studies focusing on narrow or qualitative analyses that fail to capture the diversity of real-world threats. To address this gap, we present the MCP Attack Library (MCPLIB), which categorizes and implements 31 distinct attack methods under four key classifications: direct tool injection, indirect tool injection, malicious user attacks, and LLM inherent attack. We further conduct a quantitative analysis of the efficacy of each attack. Our experiments reveal key insights into MCP vulnerabilities, including agents' blind reliance on tool descriptions, sensitivity to file-based attacks, chain attacks exploiting shared context, and difficulty distinguishing external data from executable commands. These insights, validated through attack experiments, underscore the urgency for robust defense strategies and informed MCP design. Our contributions include 1) constructing a comprehensive MCP attack taxonomy, 2) introducing a unified attack framework MCPLIB, and 3) conducting empirical vulnerability analysis to enhance MCP security mechanisms. This work provides a foundational framework, supporting the secure evolution of MCP ecosystems.