Hierarchical Continual Reinforcement Learning via Large Language Model

📅 2024-01-25

📈 Citations: 2

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing continual reinforcement learning (CRL) methods primarily focus on fine-grained knowledge transfer across similar tasks, struggling to support cross-task generalization over heterogeneous task sequences. To address this, we propose a two-level goal-driven framework: a large language model (LLM) serves as a high-level controller that generates abstract, generalizable goal sequences, while a low-level policy network executes goal-conditioned actions; iterative, feedback-driven optimization constructs a retrievable and reusable hierarchical skill library. This work is the first to systematically integrate LLMs into CRL as high-level goal generators and to establish a hierarchical goal–skill co-evolution mechanism. Evaluated on continuous, heterogeneous MiniGrid benchmarks, our approach significantly outperforms mainstream CRL baselines, achieving consistent improvements in generalization capability, training stability, and sample efficiency.

Technology Category

Application Category

📝 Abstract

The ability to learn continuously in dynamic environments is a crucial requirement for reinforcement learning (RL) agents applying in the real world. Despite the progress in continual reinforcement learning (CRL), existing methods often suffer from insufficient knowledge transfer, particularly when the tasks are diverse. To address this challenge, we propose a new framework, Hierarchical Continual reinforcement learning via large language model (Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM), which represents agenerates a sequence of goals, and low-level policy learning that closely aligns with goal-oriented RL practices, producing the agent's actions in response to the goals set forth. The framework employs feedback to iteratively adjust and verify highlevel policies, storing them along with low-level policies within a skill library. When encountering a new task, Hi-Core retrieves relevant experience from this library to help to learning. Through experiments on Minigrid, Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.

Problem

Research questions and friction points this paper is trying to address.

Enhancing knowledge transfer in continual reinforcement learning

Addressing insufficient transfer across diverse tasks

Combining coarse and fine-grained policy learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity policy learning framework

LLM for coarse-grained goal setting

Policy library for knowledge transfer

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study