🤖 AI Summary
This paper addresses security threats targeting tool-augmented large language models (LLMs) deployed on edge devices for smart home applications, specifically within Model Context Protocol (MCP)-based agent chains. We propose a protocol-level online intrusion detection system. Methodologically, we design NEBULA-Schema—a behavioral modeling framework that jointly encodes session-directed acyclic graphs (DAGs), temporal permission transitions, and attribute features into a streaming heterogeneous temporal graph, enabling GPU-free lightweight inference. We further construct a minimal attack suite—including command privilege escalation and tool-chain leakage—to support training and evaluation. Experiments on an Intel N150 edge platform achieve sub-second end-to-end alerting latency, significantly outperforming traffic- and sequence-only baselines. Ablation studies confirm that the DAG structure and permission signals are the key innovative components driving detection efficacy.
📝 Abstract
In this work, we study security of Model Context Protocol (MCP) agent toolchains and their applications in smart homes. We introduce AegisMCP, a protocol-level intrusion detector. Our contributions are: (i) a minimal attack suite spanning instruction-driven escalation, chain-of-tool exfiltration, malicious MCP server registration, and persistence; (ii) NEBULA-Schema (Network-Edge Behavioral Learning for Untrusted LLM Agents), a reusable protocol-level instrumentation that represents MCP activity as a streaming heterogeneous temporal graph over agents, MCP servers, tools, devices, remotes, and sessions; and (iii) a CPU-only streaming detector that fuses novelty, session-DAG structure, and attribute cues for near-real-time edge inference, with optional fusion of local prompt-guardrail signals. On an emulated smart-home testbed spanning multiple MCP stacks and a physical bench, AegisMCP achieves sub-second per-window model inference and end-to-end alerting. The latency of AegisMCP is consistently sub-second on Intel N150-class edge hardware, while outperforming traffic-only and sequence baselines; ablations confirm the importance of DAG and install/permission signals. We release code, schemas, and generators for reproducible evaluation.