MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Model Context Protocol (MCP) servers face novel security risks in open, multi-hop, and uncertain environments, yet existing benchmarks lack realistic workflow coverage and cross-server collaborative evaluation capabilities. Method: We introduce the first security evaluation benchmark specifically designed for real-world MCP servers, spanning five practical scenarios—including browser automation and financial analysis—and propose a taxonomy of 20 MCP-specific attack classes across server, host, and user threat layers. Furthermore, we implement the first multi-hop collaborative security evaluation framework grounded in actual MCP deployments. Results: Our systematic evaluation of leading open- and closed-source large language models (LLMs) reveals a pronounced degradation in security performance with increasing task steps and server interaction counts, exposing critical vulnerabilities of current LLMs in authentic MCP settings.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs, revealing large disparities in safety performance and escalating vulnerabilities as task horizons and server interactions grow. Our results highlight the urgent need for stronger defenses and establish MCP-SafetyBench as a foundation for diagnosing and mitigating safety risks in real-world MCP deployments.

Problem

Research questions and friction points this paper is trying to address.

Evaluates safety risks in LLMs using real-world MCP servers

Addresses multi-turn attacks across diverse domains like finance and navigation

Identifies vulnerabilities from multi-server interactions and extended task sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark built on real MCP servers for safety evaluation

Unified taxonomy of 20 MCP attack types across multiple domains

Multi-turn evaluation requiring reasoning and cross-server coordination

🔎 Similar Papers

S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models