π€ AI Summary
Existing continual reinforcement learning (CRL) methods primarily focus on fine-grained knowledge transfer across similar tasks, struggling to support cross-task generalization over heterogeneous task sequences. To address this, we propose a two-level goal-driven framework: a large language model (LLM) serves as a high-level controller that generates abstract, generalizable goal sequences, while a low-level policy network executes goal-conditioned actions; iterative, feedback-driven optimization constructs a retrievable and reusable hierarchical skill library. This work is the first to systematically integrate LLMs into CRL as high-level goal generators and to establish a hierarchical goalβskill co-evolution mechanism. Evaluated on continuous, heterogeneous MiniGrid benchmarks, our approach significantly outperforms mainstream CRL baselines, achieving consistent improvements in generalization capability, training stability, and sample efficiency.
π Abstract
The ability to learn continuously in dynamic environments is a crucial requirement for reinforcement learning (RL) agents applying in the real world. Despite the progress in continual reinforcement learning (CRL), existing methods often suffer from insufficient knowledge transfer, particularly when the tasks are diverse. To address this challenge, we propose a new framework, Hierarchical Continual reinforcement learning via large language model (Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM), which represents agenerates a sequence of goals, and low-level policy learning that closely aligns with goal-oriented RL practices, producing the agent's actions in response to the goals set forth. The framework employs feedback to iteratively adjust and verify highlevel policies, storing them along with low-level policies within a skill library. When encountering a new task, Hi-Core retrieves relevant experience from this library to help to learning. Through experiments on Minigrid, Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.