MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

📅 2026-01-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the vulnerability of tool-augmented large language models to prompt injection and sensitive data leakage during execution, a challenge inadequately mitigated by existing static analysis techniques that fail to capture dynamic behaviors. The paper presents the first lightweight framework integrating WebAssembly/WASI sandboxing with runtime provenance tracking to securely execute untrusted tools via the Model Context Protocol (MCP). It dynamically traces the flow of external inputs—such as environment variables, file contents, and HTTP payloads—to sensitive outputs, enabling auditable leakage detection through runtime analysis and substring matching. Case studies on three representative tools successfully identify input leakage and filesystem privilege violations. Experimental results demonstrate significant improvements over static string-signature methods, with microbenchmarks quantifying both false negative and false positive rates.

Technology Category

Application Category

📝 Abstract

Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners often focus on static artifacts, analyzing runtime behavior is challenging because directly executing untrusted tools can itself be dangerous. We present MCP-SandboxScan, a lightweight framework motivated by the Model Context Protocol (MCP) that safely executes untrusted tools inside a WebAssembly/WASI sandbox and produces auditable reports of external-to-sink exposures. Our prototype (i) extracts LLM-relevant sinks from runtime outputs (prompt/messages and structured tool-return fields), (ii) instantiates external-input candidates from environment values, mounted file contents, and output-surfaced HTTP fetch intents, and (iii) links sources to sinks via snippet-based substring matching. Case studies on three representative tools show that MCP-SandboxScan can surface provenance evidence when external inputs appear in prompt/messages or tool-return payloads, and can expose filesystem capability violations as runtime evidence. We further compare against a lightweight static string-signature baseline and use a micro-benchmark to characterize false negatives under transformations and false positives from short-token collisions.

Problem

Research questions and friction points this paper is trying to address.

tool-augmented LLM agents

runtime security risks

prompt injection

external input exposure

untrusted tool execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

WebAssembly sandboxing

runtime analysis

LLM tool security