🤖 AI Summary
Tool metadata poisoning attacks (TPAs) in Model Context Protocol (MCP) manipulate LLM agents by corrupting tool descriptions—inducing unauthorized actions without actual tool invocation, thereby evading conventional behavior-level defenses.
Method: We propose a decision-level defense paradigm that first links LLM attention mechanisms to decision logic, constructing a Decision Dependency Graph (DDG) for pre-invocation provenance tracing and attack source attribution. Our approach integrates attention-driven decision tracking, DDG modeling, graph-structural anomaly detection, and secure policy transfer from Program Dependency Graphs (PDGs).
Contribution/Results: Evaluated on real-world datasets, our method achieves 94–99% average detection accuracy and 95–100% attribution accuracy, with inference latency under 1 second and zero additional token overhead. This is the first work to leverage attention dynamics for decision-level TPA detection and root-cause attribution in MCP-based agent systems.
📝 Abstract
The Model Context Protocol (MCP) is increasingly adopted to standardize the interaction between LLM agents and external tools. However, this trend introduces a new threat: Tool Poisoning Attacks (TPA), where tool metadata is poisoned to induce the agent to perform unauthorized operations. Existing defenses that primarily focus on behavior-level analysis are fundamentally ineffective against TPA, as poisoned tools need not be executed, leaving no behavioral trace to monitor.
Thus, we propose MindGuard, a decision-level guardrail for LLM agents, providing provenance tracking of call decisions, policy-agnostic detection, and poisoning source attribution against TPA. While fully explaining LLM decision remains challenging, our empirical findings uncover a strong correlation between LLM attention mechanisms and tool invocation decisions. Therefore, we choose attention as an empirical signal for decision tracking and formalize this as the Decision Dependence Graph (DDG), which models the LLM's reasoning process as a weighted, directed graph where vertices represent logical concepts and edges quantify the attention-based dependencies. We further design robust DDG construction and graph-based anomaly analysis mechanisms that efficiently detect and attribute TPA attacks. Extensive experiments on real-world datasets demonstrate that MindGuard achieves 94%-99% average precision in detecting poisoned invocations, 95%-100% attribution accuracy, with processing times under one second and no additional token cost. Moreover, DDG can be viewed as an adaptation of the classical Program Dependence Graph (PDG), providing a solid foundation for applying traditional security policies at the decision level.