🤖 AI Summary
Model Context Protocol (MCP) servers face novel security risks in open, multi-hop, and uncertain environments, yet existing benchmarks lack realistic workflow coverage and cross-server collaborative evaluation capabilities.
Method: We introduce the first security evaluation benchmark specifically designed for real-world MCP servers, spanning five practical scenarios—including browser automation and financial analysis—and propose a taxonomy of 20 MCP-specific attack classes across server, host, and user threat layers. Furthermore, we implement the first multi-hop collaborative security evaluation framework grounded in actual MCP deployments.
Results: Our systematic evaluation of leading open- and closed-source large language models (LLMs) reveals a pronounced degradation in security performance with increasing task steps and server interaction counts, exposing critical vulnerabilities of current LLMs in authentic MCP settings.
📝 Abstract
Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs, revealing large disparities in safety performance and escalating vulnerabilities as task horizons and server interactions grow. Our results highlight the urgent need for stronger defenses and establish MCP-SafetyBench as a foundation for diagnosing and mitigating safety risks in real-world MCP deployments.