🤖 AI Summary
Large language model (LLM) agents face challenges in skill acquisition—including reliance on human-collected demonstration trajectories, low-quality and inefficient self-generated tasks, and insufficient learning signals. Method: This paper proposes EXIF, an exploration-prioritized, closed-loop feedback framework for automatic skill discovery. Its core innovation is a novel dual-agent collaboration: the exploration agent (Alice) actively explores the environment, performs trial-and-error, and retrospectively generates executable, semantically coherent skill data; the goal agent (Bob) is trained on this data and provides multi-round performance evaluations to iteratively refine Alice’s exploration strategy. The framework enables self-evolution within a single LLM, requiring no human intervention. Results: On WebShop and Crafter benchmarks, EXIF significantly improves task completion rates, autonomously discovers high-quality skill sets, and achieves, for the first time, progressive agent capability enhancement without any human annotations.
📝 Abstract
Training large language model (LLM) agents to acquire necessary skills and perform diverse tasks within an environment is gaining interest as a means to enable open-endedness. However, creating the training dataset for their skill acquisition faces several challenges. Manual trajectory collection requires significant human effort. Another approach, where LLMs directly propose tasks to learn, is often invalid, as the LLMs lack knowledge of which tasks are actually feasible. Moreover, the generated data may not provide a meaningful learning signal, as agents often already perform well on the proposed tasks. To address this, we propose a novel automatic skill discovery framework EXIF for LLM-powered agents, designed to improve the feasibility of generated target behaviors while accounting for the agents' capabilities. Our method adopts an exploration-first strategy by employing an exploration agent (Alice) to train the target agent (Bob) to learn essential skills in the environment. Specifically, Alice first interacts with the environment to retrospectively generate a feasible, environment-grounded skill dataset, which is then used to train Bob. Crucially, we incorporate an iterative feedback loop, where Alice evaluates Bob's performance to identify areas for improvement. This feedback then guides Alice's next round of exploration, forming a closed-loop data generation process. Experiments on Webshop and Crafter demonstrate EXIF's ability to effectively discover meaningful skills and iteratively expand the capabilities of the trained agent without any human intervention, achieving substantial performance improvements. Interestingly, we observe that setting Alice to the same model as Bob also notably improves performance, demonstrating EXIF's potential for building a self-evolving system.