Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
Existing defense mechanisms struggle to consistently identify and recover malicious intent under semantic-preserving rewrites and lack structured security auditing capabilities for multi-file agent skill packages. This work proposes SkillGuard-Robust, a novel framework that introduces, for the first time, structured, multi-perspective package-level auditing into agent skill safety. By integrating role-aware evidence extraction, selective semantic validation, and consistency-preserving adjudication, it enables robust cross-file ternary preloading audits. Evaluation on SkillGuardBench and two public ecosystem extension sets demonstrates that the approach achieves 97.30% exact match accuracy, 98.33% malicious recall, and 98.89% attack consistency across 404 skill packages. In tests on 254 external packages, all three metrics exceed 99.66%, with the highest reaching 100%.
📝 Abstract
Agent Skills package SKILL.md files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact consistency. On the 254-package external-ecosystem view, it reaches 99.66%, 100.00%, and 100.00%, respectively. These results support a bounded conclusion: factorized package auditing materially improves frozen and public-ecosystem robustness, while harsher external-source transfer remains an open challenge.
Problem

Research questions and friction points this paper is trying to address.

Security Auditing
Agent Skills
Robustness
Malicious Intent
Pre-load Auditing
Innovation

Methods, ideas, or system contributions that make the work stand out.

structured security auditing
Agent Skills
robustness enhancement
semantic-preserving attacks
cross-file verification
L
Lijia Lv
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
X
Xuehai Tang
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
Jie Wen
Jie Wen
Associate Professor, North University of China(NUC)
Quantum ControlPrognostic and Health Management
J
Jizhong Han
Institute of Information Engineering, Chinese Academy of Sciences
S
Songlin Hu
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences