🤖 AI Summary
Open-ended AI agents must continually acquire increasingly complex, abstract, and diverse goals in dynamically evolving environments; however, existing hierarchical reinforcement learning (HRL) approaches rely on manually defined subgoal spaces and pretrained low-level policies, limiting adaptability to natural goal-space expansion and wide difficulty gaps.
Method: We propose a language-driven hierarchical RL framework wherein a large language model (LLM) serves as the high-level controller for goal generalization and decomposition; a dynamic skill compilation mechanism automatically distills successfully achieved goals into reusable low-level policies; and a dynamic neural network policy enables end-to-end continual learning.
Contribution/Results: Evaluated in the Crafter environment, our framework significantly improves sample efficiency, enables scalable acquisition of complex goals, and demonstrates strong zero-shot generalization and adaptation to unseen tasks—without requiring handcrafted subgoals or fixed skill libraries.
📝 Abstract
Open-ended AI agents need to be able to learn efficiently goals of increasing complexity, abstraction and heterogeneity over their lifetime. Beyond sampling efficiently their own goals, autotelic agents specifically need to be able to keep the growing complexity of goals under control, limiting the associated growth in sample and computational complexity. To adress this challenge, recent approaches have leveraged hierarchical reinforcement learning (HRL) and language, capitalizing on its compositional and combinatorial generalization capabilities to acquire temporally extended reusable behaviours. Existing approaches use expert defined spaces of subgoals over which they instantiate a hierarchy, and often assume pre-trained associated low-level policies. Such designs are inadequate in open-ended scenarios, where goal spaces naturally diversify across a broad spectrum of difficulties. We introduce HERAKLES, a framework that enables a two-level hierarchical autotelic agent to continuously compile mastered goals into the low-level policy, executed by a small, fast neural network, dynamically expanding the set of subgoals available to the high-level policy. We train a Large Language Model (LLM) to serve as the high-level controller, exploiting its strengths in goal decomposition and generalization to operate effectively over this evolving subgoal space. We evaluate HERAKLES in the open-ended Crafter environment and show that it scales effectively with goal complexity, improves sample efficiency through skill compilation, and enables the agent to adapt robustly to novel challenges over time.