SkillOS: Learning Skill Curation for Self-Evolving Agents

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the limited capacity of current large language model (LLM) agents to self-evolve due to the absence of mechanisms for continual learning and long-term skill management. To overcome this, the authors propose SkillOS, which introduces a trainable, reinforcement learning–based skill curator that collaborates with a frozen LLM executor. By leveraging composite rewards and task-dependency–aware grouping, the curator learns from delayed feedback to dynamically update an external skill repository. This approach enables generalization across both executors and task domains, significantly outperforming baseline methods in both multi-turn and single-turn settings. Over time, the evolving skill library organizes into structured Markdown-based meta-skills, enhancing both retrieval accuracy and overall system efficiency.

📝 Abstract

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.

Problem

Research questions and friction points this paper is trying to address.

skill curation

self-evolving agents

long-term policy learning

delayed feedback

reusable skills

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill curation

self-evolving agents

reinforcement learning