MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Model Context Protocol (MCP) servers face novel security risks in open, multi-hop, and uncertain environments, yet existing benchmarks lack realistic workflow coverage and cross-server collaborative evaluation capabilities. Method: We introduce the first security evaluation benchmark specifically designed for real-world MCP servers, spanning five practical scenarios—including browser automation and financial analysis—and propose a taxonomy of 20 MCP-specific attack classes across server, host, and user threat layers. Furthermore, we implement the first multi-hop collaborative security evaluation framework grounded in actual MCP deployments. Results: Our systematic evaluation of leading open- and closed-source large language models (LLMs) reveals a pronounced degradation in security performance with increasing task steps and server interaction counts, exposing critical vulnerabilities of current LLMs in authentic MCP settings.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs, revealing large disparities in safety performance and escalating vulnerabilities as task horizons and server interactions grow. Our results highlight the urgent need for stronger defenses and establish MCP-SafetyBench as a foundation for diagnosing and mitigating safety risks in real-world MCP deployments.
Problem

Research questions and friction points this paper is trying to address.

Evaluates safety risks in LLMs using real-world MCP servers
Addresses multi-turn attacks across diverse domains like finance and navigation
Identifies vulnerabilities from multi-server interactions and extended task sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark built on real MCP servers for safety evaluation
Unified taxonomy of 20 MCP attack types across multiple domains
Multi-turn evaluation requiring reasoning and cross-server coordination
🔎 Similar Papers
No similar papers found.
X
Xuanjun Zong
East China Normal University
Zhiqi Shen
Zhiqi Shen
Nanyang Technological University
Goal ModelingSoftware AgentsIntelligent AgentsHealth GamesEducational Games
L
Lei Wang
Singapore Management University
Y
Yunshi Lan
East China Normal University
C
Chao Yang
Shanghai AI Laboratory