🤖 AI Summary
Weak generalization to unseen tasks hinders the practical deployment of reinforcement learning (RL) agents. To address this, we propose a memory-augmented meta-RL framework centered on a novel task-structured memory mechanism: it explicitly models structural relationships among tasks to enable context-aware policy adaptation, achieving zero-shot cross-task transfer without any environmental interaction. Our approach integrates task embeddings, a meta-policy network, and a differentiable, updateable memory module. We validate it on both simulated and real-world legged locomotion tasks—including deployment on physical quadrupedal robots. Results demonstrate successful zero-shot generalization to novel tasks, robust in-distribution performance, and significantly improved sample efficiency over state-of-the-art baselines.
📝 Abstract
In reinforcement learning (RL), agents often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.