🤖 AI Summary
This work addresses the scalability limitations of traditional Model Context Protocol (MCP) in large-scale tool invocation, which suffers from high coordination overhead, state fragmentation, and insufficient broad-context support. To overcome these challenges, we propose CE-MCP, a novel approach that compiles complex workflows into single-execution programs for efficient isolated runtime execution. We formally distinguish, for the first time, between context-coupled and context-decoupled architectures, demonstrating that while performance improves significantly, the attack surface also expands substantially. To mitigate emerging security risks, we develop a layered defense framework comprising MCP-Bench for evaluation, MAESTRO for security analysis, containerized sandboxing, and semantic gating. Experimental results show that CE-MCP markedly reduces token consumption and latency while uncovering 16 distinct attack vectors across five stages, effectively enhancing security in multi-LLM environments.
📝 Abstract
Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution environments. As MCP-based systems scale to incorporate larger tool catalogs and multiple concurrently connected MCP servers, traditional tool-by-tool invocation increases coordination overhead, fragments state management, and limits support for wide-context operations. To address these scalability challenges, recent MCP designs have incorporated code execution as a first-class capability, an approach called Code Execution MCP (CE-MCP). This enables agents to consolidate complex workflows, such as SQL querying, file analysis, and multi-step data transformations, into a single program that executes within an isolated runtime environment. In this work, we formalize the architectural distinction between context-coupled (traditional) and context-decoupled (CE-MCP) models, analyzing their fundamental scalability trade-offs. Using the MCP-Bench framework across 10 representative servers, we empirically evaluate task behavior, tool utilization patterns, execution latency, and protocol efficiency as the scale of connected MCP servers and available tools increases, demonstrating that while CE-MCP significantly reduces token usage and execution latency, it introduces a vastly expanded attack surface. We address this security gap by applying the MAESTRO framework, identifying sixteen attack classes across five execution phases-including specific code execution threats such as exception-mediated code injection and unsafe capability synthesis. We validate these vulnerabilities through adversarial scenarios across multiple LLMs and propose a layered defense architecture comprising containerized sandboxing and semantic gating. Our findings provide a rigorous roadmap for balancing scalability and security in production-ready executable agent workflows.