Quantifying Conversation Drift in MCP via Latent Polytope

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Model Context Protocol (MCP) operates in a non-isolated execution environment, rendering it vulnerable to tool poisoning and indirect prompt injection attacks—leading to dialogue hijacking, misdirection, and data leakage. Existing detection approaches, whether rule-based or LLM-driven, suffer from poor risk quantification and low efficiency. To address this, we propose SecMCP, the first security framework for MCP that introduces latent polyhedral space modeling: it represents LLM activation vector trajectories as geometric polyhedra, enabling quantifiable detection of dialogue drift. Our method integrates activation analysis, trajectory modeling, and anomaly detection. Evaluated on MS MARCO, HotpotQA, and FinQA benchmarks across Llama-3, Vicuna, and Mistral models, SecMCP achieves AUROC > 0.915—significantly outperforming baselines—while maintaining high detection accuracy and system usability.

Technology Category

Application Category

📝 Abstract

The Model Context Protocol (MCP) enhances large language models (LLMs) by integrating external tools, enabling dynamic aggregation of real-time data to improve task execution. However, its non-isolated execution context introduces critical security and privacy risks. In particular, adversarially crafted content can induce tool poisoning or indirect prompt injection, leading to conversation hijacking, misinformation propagation, or data exfiltration. Existing defenses, such as rule-based filters or LLM-driven detection, remain inadequate due to their reliance on static signatures, computational inefficiency, and inability to quantify conversational hijacking. To address these limitations, we propose SecMCP, a secure framework that detects and quantifies conversation drift, deviations in latent space trajectories induced by adversarial external knowledge. By modeling LLM activation vectors within a latent polytope space, SecMCP identifies anomalous shifts in conversational dynamics, enabling proactive detection of hijacking, misleading, and data exfiltration. We evaluate SecMCP on three state-of-the-art LLMs (Llama3, Vicuna, Mistral) across benchmark datasets (MS MARCO, HotpotQA, FinQA), demonstrating robust detection with AUROC scores exceeding 0.915 while maintaining system usability. Our contributions include a systematic categorization of MCP security threats, a novel latent polytope-based methodology for quantifying conversation drift, and empirical validation of SecMCP's efficacy.

Problem

Research questions and friction points this paper is trying to address.

Detects adversarial content risks in Model Context Protocol

Quantifies conversation drift using latent polytope space

Secures LLMs against hijacking and data exfiltration

Innovation

Methods, ideas, or system contributions that make the work stand out.

SecMCP detects conversation drift via latent polytope

Models LLM activation vectors for anomaly detection

Quantifies adversarial deviations in latent space trajectories

🔎 Similar Papers

COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling