🤖 AI Summary
Model Context Protocol (MCP) operates in a non-isolated execution environment, rendering it vulnerable to tool poisoning and indirect prompt injection attacks—leading to dialogue hijacking, misdirection, and data leakage. Existing detection approaches, whether rule-based or LLM-driven, suffer from poor risk quantification and low efficiency. To address this, we propose SecMCP, the first security framework for MCP that introduces latent polyhedral space modeling: it represents LLM activation vector trajectories as geometric polyhedra, enabling quantifiable detection of dialogue drift. Our method integrates activation analysis, trajectory modeling, and anomaly detection. Evaluated on MS MARCO, HotpotQA, and FinQA benchmarks across Llama-3, Vicuna, and Mistral models, SecMCP achieves AUROC > 0.915—significantly outperforming baselines—while maintaining high detection accuracy and system usability.
📝 Abstract
The Model Context Protocol (MCP) enhances large language models (LLMs) by integrating external tools, enabling dynamic aggregation of real-time data to improve task execution. However, its non-isolated execution context introduces critical security and privacy risks. In particular, adversarially crafted content can induce tool poisoning or indirect prompt injection, leading to conversation hijacking, misinformation propagation, or data exfiltration. Existing defenses, such as rule-based filters or LLM-driven detection, remain inadequate due to their reliance on static signatures, computational inefficiency, and inability to quantify conversational hijacking. To address these limitations, we propose SecMCP, a secure framework that detects and quantifies conversation drift, deviations in latent space trajectories induced by adversarial external knowledge. By modeling LLM activation vectors within a latent polytope space, SecMCP identifies anomalous shifts in conversational dynamics, enabling proactive detection of hijacking, misleading, and data exfiltration. We evaluate SecMCP on three state-of-the-art LLMs (Llama3, Vicuna, Mistral) across benchmark datasets (MS MARCO, HotpotQA, FinQA), demonstrating robust detection with AUROC scores exceeding 0.915 while maintaining system usability. Our contributions include a systematic categorization of MCP security threats, a novel latent polytope-based methodology for quantifying conversation drift, and empirical validation of SecMCP's efficacy.