Evaluating Tool Cloning in Agentic-AI Ecosystems

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study addresses the widespread prevalence of cloning, fine-tuning, or template-based derivation in agent tool ecosystems, which inflates perceived diversity and introduces risks such as benchmark contamination, vulnerability propagation, and intellectual property concerns. Conducting the first large-scale empirical analysis, the authors construct a unified dataset encompassing nearly 100,000 tools and systematically detect implementation-level duplication through lexical similarity (Jaccard index), fuzzy structural hashing (ssdeep), pairwise cross-repository comparisons, and manual validation. Their findings reveal that 60% of tool pairs with high Jaccard similarity and 85% of those with high ssdeep similarity are indeed clones, underscoring the extent and severity of code replication in the MCP ecosystem. The results highlight the critical need for future benchmarks to rigorously account for code provenance and implementation similarity.

📝 Abstract

Agent tools are becoming a core interface through which LLM agents access external data, services, and execution environments. As these tools are distributed through public marketplaces, raw tool counts may substantially overstate ecosystem diversity if many repositories are cloned, lightly modified, or derived from shared templates. Such hidden duplication can contaminate benchmark splits, propagate vulnerable implementations, bias measurements of tool-use generalization, and raise provenance, attribution, and intellectual-property concerns. We present, to our knowledge, the first large-scale measurement study of tool cloning in agentic AI ecosystems. We curate a unified dataset from multiple public platforms, covering 7,508 Model Context Protocol (MCP) repositories with 87,564 extracted tools and 1,353 Skills repositories with 12,447 tools, for a total of 8,861 repositories and 100,011 tool entries. To measure implementation-level duplication, we build a repository-level auditing pipeline using complementary lexical and fuzzy-structural similarity metrics, and compute pairwise similarity across MCP-to-MCP, Skills-to-Skills, and MCP-to-Skills repository pairs. We further manually verify 100 sampled pairs per MCP and Skills ecosystem across similarity-score buckets to calibrate how often high similarity reflects true code cloning. Our analysis shows that cloning is not an isolated artifact: high-similarity regions appear across comparison settings, and 60\% of high-Jaccard candidates and 85\% of high-ssdeep candidates in the MCP ecosystem are manually verified as clones. These results indicate that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems. They further suggest that agent-tool datasets and benchmarks should account for repository provenance and implementation similarity when measuring tool diversity or constructing evaluation splits.

Problem

Research questions and friction points this paper is trying to address.

tool cloning

agentic AI

ecosystem diversity

hidden duplication

benchmark contamination

Innovation

Methods, ideas, or system contributions that make the work stand out.

tool cloning

agentic AI

code similarity