🤖 AI Summary
The Model Context Protocol (MCP) ecosystem lacks systematic, empirical resources for rigorous analysis and evaluation.
Method: This paper introduces MCPCorpus—the first large-scale, extensible dataset of MCP components, comprising ~14,000 servers and 300 clients. It employs automated crawling, structural normalization, metadata annotation, and GitHub activity analysis to standardize protocol implementations; it also provides supporting tools for automatic synchronization, consistency validation, and lightweight web-based retrieval to enable dynamic updates and longitudinal ecosystem studies.
Contribution/Results: MCPCorpus establishes the first comprehensive, reproducible observational baseline for the MCP ecosystem. It significantly enhances quantitative analysis of protocol adoption trends, implementation diversity, and ecosystem health. Moreover, it serves as critical empirical infrastructure for security auditing, interoperability assessment, and standards evolution—thereby advancing evidence-driven development and governance of MCP.
📝 Abstract
The Model Context Protocol (MCP) has recently emerged as a standardized interface for connecting language models with external tools and data. As the ecosystem rapidly expands, the lack of a structured, comprehensive view of existing MCP artifacts presents challenges for research. To bridge this gap, we introduce MCPCorpus, a large-scale dataset containing around 14K MCP servers and 300 MCP clients. Each artifact is annotated with 20+ normalized attributes capturing its identity, interface configuration, GitHub activity, and metadata. MCPCorpus provides a reproducible snapshot of the real-world MCP ecosystem, enabling studies of adoption trends, ecosystem health, and implementation diversity. To keep pace with the rapid evolution of the MCP ecosystem, we provide utility tools for automated data synchronization, normalization, and inspection. Furthermore, to support efficient exploration and exploitation, we release a lightweight web-based search interface. MCPCorpus is publicly available at: https://github.com/Snakinya/MCPCorpus.