Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses semantic attacks—specifically tool poisoning, context pollution, and description tampering—arising from malicious tool metadata in Model Context Protocol (MCP) systems. To mitigate these threats without model fine-tuning, we propose the first multi-layer defense framework for MCP. Our method integrates three complementary components: (1) RSA-based cryptographic signature verification for tool metadata integrity; (2) an LLM-on-LLM semantic self-inspection mechanism to detect inconsistent or adversarial tool descriptions; and (3) heuristic runtime monitoring to identify anomalous tool invocation patterns. We systematically identify and categorize three MCP-specific semantic attack vectors—the first such taxonomy in the literature. Evaluations on GPT-4, DeepSeek-V2, and Llama-3.5 demonstrate that our framework blocks 71% of unsafe tool calls while imposing no architectural modifications or retraining requirements, significantly enhancing MCP security with minimal overhead.

Technology Category

Application Category

📝 Abstract
The Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-agent workflows. However, this autonomy creates a largely overlooked security gap. Existing defenses focus on prompt-injection attacks and fail to address threats embedded in tool metadata, leaving MCP-based systems exposed to semantic manipulation. This work analyzes three classes of semantic attacks on MCP-integrated systems: (1) Tool Poisoning, where adversarial instructions are hidden in tool descriptors; (2) Shadowing, where trusted tools are indirectly compromised through contaminated shared context; and (3) Rug Pulls, where descriptors are altered after approval to subvert behavior. To counter these threats, we introduce a layered security framework with three components: RSA-based manifest signing to enforce descriptor integrity, LLM-on-LLM semantic vetting to detect suspicious tool definitions, and lightweight heuristic guardrails that block anomalous tool behavior at runtime. Through evaluation of GPT-4, DeepSeek, and Llama-3.5 across eight prompting strategies, we find that security performance varies widely by model architecture and reasoning method. GPT-4 blocks about 71 percent of unsafe tool calls, balancing latency and safety. DeepSeek shows the highest resilience to Shadowing attacks but with greater latency, while Llama-3.5 is fastest but least robust. Our results show that the proposed framework reduces unsafe tool invocation rates without model fine-tuning or internal modification.
Problem

Research questions and friction points this paper is trying to address.

Securing Model Context Protocol against tool poisoning and adversarial attacks.
Addressing overlooked security gaps in LLM tool integration and metadata threats.
Developing a layered defense framework to ensure tool descriptor integrity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

RSA-based manifest signing ensures tool descriptor integrity
LLM-on-LLM semantic vetting detects suspicious tool definitions
Lightweight heuristic guardrails block anomalous tool behavior at runtime
🔎 Similar Papers
No similar papers found.