ARROW: Augmented Replay for RObust World models

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between catastrophic forgetting and forward transfer in continual reinforcement learning by proposing a neuroscience-inspired world model approach. Instead of replaying experiences directly to the policy network, the method replays them to a predictive world model and introduces a dual-buffer mechanism that integrates short- and long-term memory. Task diversity is preserved through intelligent sampling from these buffers. Built upon the DreamerV3 framework and incorporating an efficient replay strategy based on distribution matching, the proposed method significantly mitigates catastrophic forgetting while maintaining strong forward transfer capabilities. It outperforms existing baselines with comparable memory overhead on standard benchmarks including Atari and Procgen CoinRun.

Technology Category

Application Category

📝 Abstract
Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay buffers to mitigate catastrophic forgetting; however, these solutions often face significant scalability challenges due to large memory demands. Drawing inspiration from neuroscience, where the brain replays experiences to a predictive World Model rather than directly to the policy, we present ARROW (Augmented Replay for RObust World models), a model-based continual RL algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size FIFO buffers, ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling. We evaluate ARROW on two challenging continual RL settings: Tasks without shared structure (Atari), and tasks with shared structure, where knowledge transfer is possible (Procgen CoinRun variants). Compared to model-free and model-based baselines with replay buffers of the same-size, ARROW demonstrates substantially less forgetting on tasks without shared structure, while maintaining comparable forward transfer. Our findings highlight the potential of model-based RL and bio-inspired approaches for continual reinforcement learning, warranting further research.
Problem

Research questions and friction points this paper is trying to address.

continual reinforcement learning
catastrophic forgetting
replay buffer
scalability
world model
Innovation

Methods, ideas, or system contributions that make the work stand out.

model-based reinforcement learning
continual learning
experience replay
world model
catastrophic forgetting
🔎 Similar Papers
No similar papers found.
A
Abdulaziz Alyahya
Department of Information Systems, Imam Mohammad Ibn Saud Islamic University (IMSIU)
A
Abdallah Al Siyabi
Department of Data Science & AI, Monash University
Markus R. Ernst
Markus R. Ernst
Researcher, Frankfurt Institute for Advanced Studies
Artificial IntelligenceReinforcement LearningRobotics
L
Luke Yang
Department of Data Science & AI, Monash University
Levin Kuhlmann
Levin Kuhlmann
Associate Professor in Data Science, AI and Digital Health, Faculty of Information Technology
Data ScienceAIDigital Health and Neuro-engineeringEpilepsyAnaesthesia and Consciousness
G
Gideon Kowadlo
Cerenaut – https://cerenaut.ai