Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This study addresses a novel security threat arising from inconsistencies between tool descriptions and their actual code implementations in the Model Context Protocol (MCP) ecosystem, which may mislead AI agents into performing unauthorized or high-risk operations. We present the first large-scale analysis of description-code consistency across 10,240 MCP servers, leveraging an automated static analysis framework to systematically evaluate the impact of such discrepancies on AI decision-making. Our findings reveal that approximately 13% of servers exhibit severe inconsistencies capable of enabling unauthorized financial transactions, covert state manipulation, and other critical risks. Furthermore, we uncover significant variations in the prevalence and nature of these inconsistencies across different tool categories and deployment platforms, establishing description-code divergence as a widespread and hazardous emerging attack surface in AI-integrated systems.

Technology Category

Application Category

📝 Abstract

The Model Context Protocol (MCP) enables large language models to invoke external tools through natural-language descriptions, forming the foundation of many AI agent applications. However, MCP does not enforce consistency between documented tool behavior and actual code execution, even though MCP Servers often run with broad system privileges. This gap introduces a largely unexplored security risk. We study how mismatches between externally presented tool descriptions and underlying implementations systematically shape the mental models and decision-making behavior of intelligent agents. Specifically, we present the first large-scale study of description-code inconsistency in the MCP ecosystem. We design an automated static analysis framework and apply it to 10,240 real-world MCP Servers across 36 categories. Our results show that while most servers are highly consistent, approximately 13% exhibit substantial mismatches that can enable undocumented privileged operations, hidden state mutations, or unauthorized financial actions. We further observe systematic differences across application categories, popularity levels, and MCP marketplaces. Our findings demonstrate that description-code inconsistency is a concrete and prevalent attack surface in MCP-based AI agents, and motivate the need for systematic auditing and stronger transparency guarantees in future agent ecosystems.

Problem

Research questions and friction points this paper is trying to address.

Model Context Protocol

tool description inconsistency

AI agent security

description-code mismatch

privileged operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Context Protocol

description-code inconsistency

static analysis