Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the tool poisoning threat faced by large language model (LLM) agents when integrating Model Context Protocol (MCP) tools. We propose AutoMalTool, the first automated red-teaming framework specifically designed for MCP poisoning scenarios. AutoMalTool systematically synthesizes malicious MCP tools while incorporating behavioral manipulation and detection evasion techniques to enable end-to-end penetration testing against state-of-the-art LLM agents. Unlike prior manual evaluation approaches, AutoMalTool is the first to systematically uncover novel security vulnerabilities within the MCP ecosystem: its generated malicious tools reliably hijack agent behavior and bypass existing defense mechanisms—demonstrating severe real-world deployment risks. Our empirical findings provide critical evidence for establishing MCP security standards and designing robust, trustworthy LLM agents.

Technology Category

Application Category

📝 Abstract

The remarkable capability of large language models (LLMs) has led to the wide application of LLM-based agents in various domains. To standardize interactions between LLM-based agents and their environments, model context protocol (MCP) tools have become the de facto standard and are now widely integrated into these agents. However, the incorporation of MCP tools introduces the risk of tool poisoning attacks, which can manipulate the behavior of LLM-based agents. Although previous studies have identified such vulnerabilities, their red teaming approaches have largely remained at the proof-of-concept stage, leaving the automatic and systematic red teaming of LLM-based agents under the MCP tool poisoning paradigm an open question. To bridge this gap, we propose AutoMalTool, an automated red teaming framework for LLM-based agents by generating malicious MCP tools. Our extensive evaluation shows that AutoMalTool effectively generates malicious MCP tools capable of manipulating the behavior of mainstream LLM-based agents while evading current detection mechanisms, thereby revealing new security risks in these agents.

Problem

Research questions and friction points this paper is trying to address.

Automatically testing LLM-based agents for tool poisoning vulnerabilities

Systematically generating malicious MCP tools to manipulate agent behavior

Addressing security risks in LLM agents using standardized protocol tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated red teaming framework for LLM-based agents

Generates malicious MCP tools to manipulate behavior

Evades detection mechanisms to reveal security risks

🔎 Similar Papers

No similar papers found.