Retrieval-Augmented Decision Transformer: External Memory for In-context RL

📅 2024-10-09
🏛️ arXiv.org
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
Long-horizon sparse rewards in complex reinforcement learning (RL) hinder in-context learning (ICL-RL), as existing methods rely on full-episode contexts and fail to generalize to extended episodes. Method: We propose a training-free, domain-agnostic Retrieval-Augmented Decision Transformer (RA-DT) that integrates a non-parametric external memory bank and a trajectory segmentation encoding mechanism. RA-DT retrieves only semantically similar sub-trajectory segments conditioned on the current state, drastically reducing context length. Contribution/Results: Our approach overcomes ICL-RL’s reliance on short episodes, achieving superior policy performance across four standardized benchmarks—grid-world navigation, robotic manipulation, and two generative video games—using only 20% of the context length required by baselines. Notably, it attains higher return with significantly shorter context windows, demonstrating strong generalization to long-horizon settings. To foster reproducibility and further research, we publicly release a dedicated ICL-RL benchmark dataset.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP, this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL methods, however, require entire episodes in the agent's context. Given that complex environments typically lead to long episodes with sparse rewards, these methods are constrained to simple environments with short episodes. To address these challenges, we introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games. On grid-worlds, RA-DT outperforms baselines, while using only a fraction of their context length. Furthermore, we illuminate the limitations of current in-context RL methods on complex environments and discuss future directions. To facilitate future research, we release datasets for four of the considered environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses limited context length in in-context reinforcement learning
Retrieves relevant sub-trajectories from external memory for decision-making
Enables complex environment handling beyond simple episodic tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

External memory mechanism for experience storage
Retrieves relevant sub-trajectories without training
Domain-agnostic retrieval component for RL