Re:Frame -- Retrieving Experience From Associative Memory

πŸ“… 2025-08-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Offline reinforcement learning suffers from limited generalization when expert demonstrations are scarce and non-expert data is low-quality. To address this, we propose Re:Frameβ€”a plug-and-play module that introduces an external Associative Memory Buffer (AMB) to enable content-based experience retrieval. Without modifying the backbone architecture or requiring environment interaction, Re:Frame efficiently integrates a minimal amount of expert trajectories (as little as 0.1% of total data) with large-scale non-expert datasets. It dynamically injects expert knowledge during both training and inference, seamlessly adapting to sequence-based offline RL frameworks such as Decision Transformer. Evaluated on the D4RL MuJoCo benchmark, Re:Frame outperforms strong baselines on three of four tasks, achieving up to +10.7 normalized score improvement. The method significantly enhances policy performance and data efficiency under low-quality data conditions.

Technology Category

Application Category

πŸ“ Abstract
Offline reinforcement learning (RL) often deals with suboptimal data when collecting large expert datasets is unavailable or impractical. This limitation makes it difficult for agents to generalize and achieve high performance, as they must learn primarily from imperfect or inconsistent trajectories. A central challenge is therefore how to best leverage scarce expert demonstrations alongside abundant but lower-quality data. We demonstrate that incorporating even a tiny amount of expert experience can substantially improve RL agent performance. We introduce Re:Frame (Retrieving Experience From Associative Memory), a plug-in module that augments a standard offline RL policy (e.g., Decision Transformer) with a small external Associative Memory Buffer (AMB) populated by expert trajectories drawn from a separate dataset. During training on low-quality data, the policy learns to retrieve expert data from the Associative Memory Buffer (AMB) via content-based associations and integrate them into decision-making; the same AMB is queried at evaluation. This requires no environment interaction and no modifications to the backbone architecture. On D4RL MuJoCo tasks, using as few as 60 expert trajectories (0.1% of a 6000-trajectory dataset), Re:Frame consistently improves over a strong Decision Transformer baseline in three of four settings, with gains up to +10.7 normalized points. These results show that Re:Frame offers a simple and data-efficient way to inject scarce expert knowledge and substantially improve offline RL from low-quality datasets.
Problem

Research questions and friction points this paper is trying to address.

Leveraging scarce expert demonstrations with abundant low-quality data
Improving offline RL agent performance from imperfect trajectories
Integrating expert experience via associative memory without environment interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Associative Memory Buffer for expert data retrieval
Content-based association for decision integration
Plug-in module enhancing offline RL policies