MCPZoo: A Large-Scale Dataset of Runnable Model Context Protocol Servers for AI Agent

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Existing research is hindered by the absence of large-scale, accessible empirical datasets of Model Context Protocol (MCP) servers. To address this gap, we introduce MCPZoo—the first open-source, executable large-scale MCP server dataset. We systematically crawled and validated 90,146 publicly available MCP servers; over 10,000 were confirmed via deployment testing to support real-world interactive operations. Our methodology integrates multi-source harvesting, automated runtime validation, unified metadata modeling, and abstraction of REST/gRPC interfaces—thereby overcoming the limitations of conventional static analysis. MCPZoo enables rigorous empirical investigation of MCP ecosystems, significantly enhancing reproducibility and efficiency in AI agent tool invocation, security analysis, and protocol-level ecosystem assessment. The dataset is publicly released to foster community-driven advancement in MCP research and development.

Technology Category

Application Category

📝 Abstract

Model Context Protocol (MCP) enables agents to interact with external tools, yet empirical research on MCP is hindered by the lack of large-scale, accessible datasets. We present MCPZoo, the largest and most comprehensive dataset of MCP servers collected from multiple public sources, comprising 90,146 servers. MCPZoo includes over ten thousand server instances that have been deployed and verified as runnable and interactable, supporting realistic experimentation beyond static analysis. The dataset provides unified metadata and access interfaces, enabling systematic exploration and interaction without manual deployment effort. MCPZoo is released as an open and accessible resource to support research on MCP-based security analysis.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale datasets for MCP empirical research

Need for runnable, interactable MCP servers for realistic experimentation

Absence of unified metadata and access interfaces for systematic exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest dataset of runnable MCP servers

Verified runnable instances for realistic experimentation

Unified metadata and interfaces for systematic access

🔎 Similar Papers

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots