🤖 AI Summary
Existing autonomous web agents struggle with procedural knowledge abstraction, skill refinement, and composition, hindering their capacity for sustained self-improvement. This paper introduces “Skill Self-Weaving”: an LLM-driven framework enabling autonomous exploration on novel websites to iteratively execute tasks, distill execution traces, and abstract them into lightweight, reusable API-style skills—supporting hierarchical composition and cross-agent transfer. It is the first approach to achieve fully automated skill discovery, reinforcement-based refinement, and interface-oriented modeling. Evaluated on WebArena and real-world websites, it improves task success rates by 31.8% and 39.8%, respectively. Moreover, skills distilled by stronger agents, when transferred to weaker agents, yield a 54.3% performance gain—demonstrating substantial improvements in generalization and scalability.
📝 Abstract
To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural knowledge abstraction, refining skills, and skill composition. In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena. These results demonstrate the effectiveness of honing diverse website interactions into APIs, which can be seamlessly shared among various web agents.