EvoSkill: Automated Skill Discovery for Multi-Agent Systems

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing multi-agent systems struggle to automatically acquire reusable, domain-specific skills and often rely on handcrafted components tightly coupled with specific models and tasks. This work proposes EvoSkill, a framework that achieves skill-level self-evolution based on the Pareto frontier for the first time. By iteratively analyzing failure cases and integrating skill editing, structured storage, and validation-set-driven evolutionary optimization, EvoSkill automatically discovers efficient and transferable skills without fine-tuning the underlying model. The method improves accuracy by 7.3% (from 60.6% to 67.9%) on OfficeQA and by 12.1% (from 26.6% to 38.7%) on SealQA. Moreover, skills evolved on SealQA yield a 5.3% performance gain when zero-shot transferred to the BrowseComp task, significantly alleviating the strong dependency of conventional approaches on task- and model-specific designs.

Technology Category

Application Category

📝 Abstract

Coding agents are increasingly used as general-purpose problem solvers, but their flexibility does not by itself confer the domain expertise needed for specialized tasks. Recent work addresses this through \textit{agent skills}: reusable workflows, and code, that augment agents with domain-specific capabilities. Most skills today are hand-crafted, and existing evolutionary approaches optimize low-level artifacts (e.g. prompts \& code) that are tightly coupled to specific models and tasks. We introduce \textbf{EvoSkill}, a self-evolving framework that automatically discovers and refines agent skills through iterative failure analysis. EvoSkill analyzes execution failures, proposes new skills or edits to existing ones, and materializes them into structured, reusable skill folders. A Pareto frontier of agent programs governs selection, retaining only skills that improve held-out validation performance while the underlying model remains frozen. We evaluate EvoSkill on two benchmarks: OfficeQA, a grounded reasoning benchmark over U.S.\ Treasury data, where it improves exact-match accuracy by \textbf{7.3\%} (60.6\% $\to$ 67.9\%); and SealQA, a search-augmented QA benchmark with noisy retrieval, where it yields a \textbf{12.1\%} gain (26.6\% $\to$ 38.7\%). We also investigate the zero-shot transfer capabilties of skills evolved on one task to the other; in particular: skills evolved from SealQA transfers zero-shot to BrowseComp, improving accuracy by \textbf{5.3\%} without modification demonstrating that skill-level optimization produces transferable capabilities beyond the training task.

Problem

Research questions and friction points this paper is trying to address.

automated skill discovery

multi-agent systems

domain expertise

reusable skills

zero-shot transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

EvoSkill

automated skill discovery

self-evolving framework