SkillTester: Benchmarking Utility and Security of Agent Skills

πŸ“… 2026-03-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current agent skills lack a unified framework for evaluating both utility and safety. This work proposes a standardized comparative assessment methodology that holistically measures skill effectiveness and risk through paired execution comparisons, standalone safety probing tests, normalized output artifacts, and multidimensional scoring. Grounded in principles of comparative utility and user-level simplicity, the approach yields distinct utility and safety scores along with a three-tier safety status label, enabling cross-skill, comparable quality evaluation. The methodology has been operationalized in an open public service platform, skilltester.ai, which supports automated and standardized evaluation of AI agent skills, thereby providing critical infrastructure for trustworthy AI applications.
πŸ“ Abstract
This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More broadly, it can be understood as a comparative quality-assurance harness for agent skills in an agent-first world. The public service is deployed at https://skilltester.ai, and the broader project is maintained at https://github.com/skilltester-ai/skilltester.
Problem

Research questions and friction points this paper is trying to address.

agent skills
utility evaluation
security assessment
benchmarking
quality assurance
Innovation

Methods, ideas, or system contributions that make the work stand out.

agent skills
utility evaluation
security assessment
comparative framework
quality assurance
πŸ”Ž Similar Papers
No similar papers found.
Leye Wang
Leye Wang
Tenured Associate Professor, Peking University
Ubiquitous ComputingUrban ComputingCrowdsensingFederated Learning
Z
Zixing Wang
Key Laboratory of High Confidence Software Technologies, Ministry of Education, Peking University, Beijing, China; Northwestern Polytechnical University, Xi’an, China
A
Anjie Xu
Key Laboratory of High Confidence Software Technologies, Ministry of Education, Peking University, Beijing, China