SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the lack of systematic research on enabling agents to efficiently and continually learn reusable skills for real-world tasks. It introduces SkillLearnBench, the first benchmark for continual skill learning, encompassing 15 subdomains and 20 skill-dependent tasks, with evaluation along three dimensions: skill quality, execution trajectories, and task outcomes. The proposed agent framework, built upon large language models, integrates one-shot learning, self- and teacher-feedback mechanisms, and a skill generator to iteratively distill skills from experience. Experiments show that all continual learning approaches outperform skill-free baselines, though performance varies across tasks and models; notably, stronger LLMs do not consistently yield gains, while multi-round external feedback effectively mitigates recursive drift caused by self-feedback. This study establishes the first comprehensive evaluation framework for real-world skill learning, reveals performance disparities between structured and open-ended tasks, and demonstrates the critical role of external feedback in skill refinement.

Technology Category

Application Category

📝 Abstract

Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, comprising 20 verified, skill-dependent tasks across 15 sub-domains derived from a real-world skill taxonomy , evaluated at three levels: skill quality, execution trajectory, and task outcome. Using this benchmark, we evaluate recent continual learning techniques, those leveraging one-shot, self/teacher feedback, and skill creator to generate skills from agent experiences. We find that all continual learning methods improve over the no-skill baseline, yet consistent gains remain elusive: no method leads across all tasks and LLMs, and scaling to stronger LLMs does not reliably help. Continual learning improves tasks with clear, reusable workflows but struggles on open-ended tasks, and using stronger LLM backbones does not consistently produce better skills. Our analysis also revealed that multiple iterations in continual learning facilitate genuine improvement via external feedback, whereas self-feedback alone induces recursive drift. Our data and code are open-source at https://github.com/cxcscmu/SkillLearnBench to enable further studies of automatic skill generation and continual learning techniques.

Problem

Research questions and friction points this paper is trying to address.

continual learning

skill generation

LLM agents

real-world tasks

automatic skill learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

skill generation

LLM agents