Machine Learning-Based Detection of MCP Attacks

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the emerging security challenges posed by the Model Context Protocol (MCP), which introduces novel attack surfaces that existing mechanisms struggle to detect effectively. The study presents the first systematic exploration of machine learning–based approaches for identifying malicious MCP behaviors, evaluating supervised models—including Support Vector Classification (SVC) and BERT—across both binary and multiclass classification tasks. A deployable middleware prototype is developed to enable real-time interception of malicious tools prior to execution. Experimental results demonstrate that the proposed methods substantially outperform traditional rule-based baselines: multiple models achieve perfect 100% F1 scores in binary classification, while SVC and BERT attain F1 scores of 90.56% and 88.33%, respectively, in the more challenging multiclass setting.

Technology Category

Application Category

📝 Abstract

The Model Context Protocol (MCP) is a new and emerging technology that extends the functionality of large language models, improving workflows but also exposing users to a new attack surface. Several studies have highlighted related security flaws, but MCP attack detection remains underexplored. To address this research gap, this study develops and evaluates a range of supervised machine learning approaches, including both traditional and deep-learning models. We evaluated the systems on the detection of malicious MCP tool descriptions in two scenarios: (1) a binary classification task distinguishing malicious from benign tools, and (2) a multiclass classification task identifying the attack type while separating benign from malicious tools. In addition to the machine learning models, we compared a rule-based approach that serves as a baseline. The results indicate that several of the developed models achieved 100\% F1-score on the binary classification task. In the multiclass scenario, the SVC and BERT models performed best, achieving F1 scores of 90.56\% and 88.33\%, respectively. Confusion matrices were also used to visualize the full distribution of predictions often missed by traditional metrics, providing additional insight for selecting the best-fitting solution in real-world scenarios. This study presents an addition to the MCP defence area, showing that machine learning models can perform exceptionally well in separating malicious and benign data points. To apply the solution in a live environment, a middleware was developed to classify which MCP tools are safe to use before execution, and block the ones that are not safe. Furthermore, the study shows that these models can outperform traditional rule-based solutions currently in use in the field.

Problem

Research questions and friction points this paper is trying to address.

MCP attacks

attack detection

malicious tool identification

Model Context Protocol

security flaws

Innovation

Methods, ideas, or system contributions that make the work stand out.

MCP attack detection

machine learning

supervised classification