🤖 AI Summary
Existing research is hindered by the absence of large-scale, accessible empirical datasets of Model Context Protocol (MCP) servers. To address this gap, we introduce MCPZoo—the first open-source, executable large-scale MCP server dataset. We systematically crawled and validated 90,146 publicly available MCP servers; over 10,000 were confirmed via deployment testing to support real-world interactive operations. Our methodology integrates multi-source harvesting, automated runtime validation, unified metadata modeling, and abstraction of REST/gRPC interfaces—thereby overcoming the limitations of conventional static analysis. MCPZoo enables rigorous empirical investigation of MCP ecosystems, significantly enhancing reproducibility and efficiency in AI agent tool invocation, security analysis, and protocol-level ecosystem assessment. The dataset is publicly released to foster community-driven advancement in MCP research and development.
📝 Abstract
Model Context Protocol (MCP) enables agents to interact with external tools, yet empirical research on MCP is hindered by the lack of large-scale, accessible datasets. We present MCPZoo, the largest and most comprehensive dataset of MCP servers collected from multiple public sources, comprising 90,146 servers. MCPZoo includes over ten thousand server instances that have been deployed and verified as runnable and interactable, supporting realistic experimentation beyond static analysis. The dataset provides unified metadata and access interfaces, enabling systematic exploration and interaction without manual deployment effort. MCPZoo is released as an open and accessible resource to support research on MCP-based security analysis.