🤖 AI Summary
This work addresses two key bottlenecks in hierarchical reinforcement learning (HRL): inefficient exploration in complex tasks and slow cross-task adaptation. To this end, we propose a meta-augmented two-level policy framework. Methodologically, we uniquely integrate gradient-based meta-learning (MAML-style inner-loop adaptation), option-based HRL, state-novelty-driven intrinsic motivation for reward shaping, and curriculum learning: the high-level policy achieves rapid task generalization via meta-training, while the low-level policy enhances exploration efficiency across the state space through intrinsic rewards, guided progressively by curriculum learning. Experiments in a custom grid-world environment demonstrate that our approach accelerates learning by 42%, increases cumulative reward by 58%, and significantly improves task success rates over baseline HRL methods. Our core contribution is a transferable, adaptive, and exploration-efficient meta-hierarchical policy architecture.
📝 Abstract
Hierarchical Reinforcement Learning (HRL) is well-suitedd for solving complex tasks by breaking them down into structured policies. However, HRL agents often struggle with efficient exploration and quick adaptation. To overcome these limitations, we propose integrating meta-learning into HRL to enable agents to learn and adapt hierarchical policies more effectively. Our method leverages meta-learning to facilitate rapid task adaptation using prior experience, while intrinsic motivation mechanisms drive efficient exploration by rewarding the discovery of novel states. Specifically, our agent employs a high-level policy to choose among multiple low-level policies within custom-designed grid environments. By incorporating gradient-based meta-learning with differentiable inner-loop updates, we optimize performance across a curriculum of progressively challenging tasks. Experimental results highlight that our metalearning-enhanced hierarchical agent significantly outperforms standard HRL approaches lacking meta-learning and intrinsic motivation. The agent demonstrates faster learning, greater cumulative rewards, and higher success rates in complex grid-based scenarios. These Findings underscore the effectiveness of combining meta-learning, curriculum learning, and intrinsic motivation to enhance the capability of HRL agents in tackling complex tasks.