🤖 AI Summary
This paper addresses the challenge of efficiently estimating learning progress (LP) and capability for LLM-based agents in open-ended learning, where target spaces are large-scale and dynamically evolving. We propose the first metacognitive monitoring framework tailored for autonomous LLM learning. Methodologically, it integrates online reinforcement learning–driven metacognitive modeling, semantic-aware target relation embedding, a self-supervised LP prediction head, and a dynamic curriculum generation mechanism—enabling sample-efficient, annotation-free LP estimation and cross-target generalization. Unlike conventional approaches reliant on extensive sampling or brittle manual target grouping, our framework supports real-time online LP optimization and continuous adaptation to evolving target spaces. Experiments demonstrate substantial improvements in LP prediction efficiency and target coverage. To date, it is the only framework capable of enabling agents to fully master large-scale, dynamically changing target spaces through autonomous learning.
📝 Abstract
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM agents trained with online RL in high-dimensional and evolving goal spaces, a key challenge for LP prediction is modeling one's own competence, a form of metacognitive monitoring. Traditional approaches either require extensive sampling or rely on brittle expert-defined goal groupings. We introduce MAGELLAN, a metacognitive framework that lets LLM agents learn to predict their competence and LP online. By capturing semantic relationships between goals, MAGELLAN enables sample-efficient LP estimation and dynamic adaptation to evolving goal spaces through generalization. In an interactive learning environment, we show that MAGELLAN improves LP prediction efficiency and goal prioritization, being the only method allowing the agent to fully master a large and evolving goal space. These results demonstrate how augmenting LLM agents with a metacognitive ability for LP predictions can effectively scale curriculum learning to open-ended goal spaces.