SkillTester: Benchmarking Utility and Security of Agent Skills

📅 2026-03-28

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Current agent skills lack a unified framework for evaluating both utility and safety. This work proposes a standardized comparative assessment methodology that holistically measures skill effectiveness and risk through paired execution comparisons, standalone safety probing tests, normalized output artifacts, and multidimensional scoring. Grounded in principles of comparative utility and user-level simplicity, the approach yields distinct utility and safety scores along with a three-tier safety status label, enabling cross-skill, comparable quality evaluation. The methodology has been operationalized in an open public service platform, skilltester.ai, which supports automated and standardized evaluation of AI agent skills, thereby providing critical infrastructure for trustworthy AI applications.

Technology Category

Application Category

📝 Abstract

This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More broadly, it can be understood as a comparative quality-assurance harness for agent skills in an agent-first world. The public service is deployed at https://skilltester.ai, and the broader project is maintained at https://github.com/skilltester-ai/skilltester.

Problem

Research questions and friction points this paper is trying to address.

agent skills

utility evaluation

security assessment

benchmarking

quality assurance

Innovation

Methods, ideas, or system contributions that make the work stand out.

agent skills

utility evaluation

security assessment