MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically evaluates security threats faced by large language model (LLM) agents when invoking external tools under the Model Context Protocol (MCP). To address MCP-specific vulnerabilities, we propose the first end-to-end evaluation framework covering the full pipeline: task planning, tool invocation, and response processing. We establish a fine-grained taxonomy of 12 attack classes—including natural-language metadata injection, tool description poisoning, parameter overflow requests, and identity-spoofed responses—implemented atop real-world MCP toolchains. Empirical evaluation spans 400+ tools across 10 domains, comprising 2,000 test cases. We introduce Net Resilience Performance (NRP), a novel metric quantifying the trade-off between security robustness and functional performance. Results reveal that high-performing LLMs—due to their strong instruction-following capability—are paradoxically more susceptible to such attacks. This study establishes the first benchmark for MCP security assessment, enabling rigorous, standardized evaluation of agent-tool interaction safety.

Technology Category

Application Category

📝 Abstract
The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks broad interoperability, it also enlarges the attack surface by making tools first-class, composable objects with natural-language metadata, and standardized I/O. We present MSB (MCP Security Benchmark), the first end-to-end evaluation suite that systematically measures how well LLM agents resist MCP-specific attacks throughout the full tool-use pipeline: task planning, tool invocation, and response handling. MSB contributes: (1) a taxonomy of 12 attacks including name-collision, preference manipulation, prompt injections embedded in tool descriptions, out-of-scope parameter requests, user-impersonating responses, false-error escalation, tool-transfer, retrieval injection, and mixed attacks; (2) an evaluation harness that executes attacks by running real tools (both benign and malicious) via MCP rather than simulation; and (3) a robustness metric that quantifies the trade-off between security and performance: Net Resilient Performance (NRP). We evaluate nine popular LLM agents across 10 domains and 400+ tools, producing 2,000 attack instances. Results reveal the effectiveness of attacks against each stage of MCP. Models with stronger performance are more vulnerable to attacks due to their outstanding tool calling and instruction following capabilities. MSB provides a practical baseline for researchers and practitioners to study, compare, and harden MCP agents.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking security vulnerabilities in LLM agents using Model Context Protocol
Systematically evaluating resistance to 12 attack types across tool-use pipeline
Quantifying security-performance tradeoff for MCP agents through real tool execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking security attacks on Model Context Protocol
Evaluating LLM agent robustness with real tool execution
Quantifying security-performance trade-off via Net Resilient Performance
🔎 Similar Papers
No similar papers found.
D
Dongsen Zhang
Beijing University of Posts and Telecommunications
Z
Zekun Li
University of California, Santa Barbara
Xu Luo
Xu Luo
UESTC
Machine LearningRobotics
X
Xuannan Liu
Beijing University of Posts and Telecommunications
Peipei Li
Peipei Li
Beijing University of Posts and Telecommunications (BUPT)
Computer VisionImage SynthesisFace Recognition
Wenjun Xu
Wenjun Xu
Peng Cheng Laboratory
machine learningreinforcement learningflexible/soft robot