"Elementary, My Dear Watson." Detecting Malicious Skills via Neuro-Symbolic Reasoning across Heterogeneous Artifacts

📅 2026-03-28

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenge of detecting malicious skills disseminated by large language model agents through public registries, which are often concealed within heterogeneous artifacts and require contextual reasoning for identification. To this end, the authors propose MalSkills, a novel framework that introduces neuro-symbolic reasoning to this domain for the first time. MalSkills integrates symbolic parsing with large language model–based semantic analysis to extract security-sensitive operations, constructs a skill dependency graph, and performs context-aware neuro-symbolic reasoning over this graph to identify malicious patterns. Experimental results demonstrate that MalSkills achieves an F1 score of 93% on a benchmark of 200 real-world skills, outperforming existing methods by 5 to 87 percentage points. Furthermore, in a large-scale evaluation across 150,000 skills, MalSkills uncovered 620 malicious instances, including 76 previously unknown threats.

Technology Category

Application Category

📝 Abstract

Skills are increasingly used to extend LLM agents by packaging prompts, code, and configurations into reusable modules. As public registries and marketplaces expand, they form an emerging agentic supply chain, but also introduce a new attack surface for malicious skills. Detecting malicious skills is challenging because relevant evidence is often distributed across heterogeneous artifacts and must be reasoned in context. Existing static, LLM-based, and dynamic approaches each capture only part of this problem, making them insufficient for robust real-world detection. In this paper, we present MalSkills, a neuro-symbolic framework for malicious skills detection. MalSkills first extracts security-sensitive operations from heterogeneous artifacts through a combination of symbolic parsing and LLM-assisted semantic analysis. It then constructs the skill dependency graph that links artifacts, operations, operands, and value flows across the skill. On top of this graph, MalSkills performs neuro-symbolic reasoning to infer malicious patterns or previously unseen suspicious workflows. We evaluate MalSkills on a benchmark of 200 real-world skills against 5 state-of-the-art baselines. MalSkills achieves 93% F1, outperforming the baselines by 5~87 percentage points. We further apply MalSkills to analyze 150,108 skills collected from 7 public registries, revealing 620 malicious skills. As for now, we have finished reviewing 100 of them and identified 76 previously unknown malicious skills, all of which were responsibly reported and are currently awaiting confirmation from the platforms and maintainers. These results demonstrate the potential of MalSkills in securing the agentic supply chain.

Problem

Research questions and friction points this paper is trying to address.

malicious skills

agentic supply chain

heterogeneous artifacts

security detection

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic reasoning

malicious skill detection

heterogeneous artifacts