Predictive Experience Replay for Continual Visual Control and Forecasting

📅 2023-03-12

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

224K/year

🤖 AI Summary

To address catastrophic forgetting in continual physical dynamics modeling under nonstationary visual environments, this paper proposes a forgetting-free Mixture-of-Gaussians World Model (MG-WM). MG-WM integrates predictive experience replay with an exploration-conservative dual-value estimation mechanism, enabling stable visual dynamics modeling and policy optimization amid environmental evolution. It is the first method to unify three critical properties—environmental simulation continuity, predictive guidance of experience replay, and exploration-conservative balance in behavioral policies—within a model-based reinforcement learning (MBRL) framework. Evaluated on the DeepMind Control and Meta-World continual control benchmarks, MG-WM significantly outperforms state-of-the-art methods. Moreover, on evolving-domain video prediction tasks, it effectively mitigates spatiotemporal dynamics forgetting, demonstrating robust adaptation to dynamic distribution shifts.

📝 Abstract

Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting, which we call predictive experience replay. Finally, we extend these methods to continual RL and further address the value estimation problems with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks. It is also shown to effectively alleviate the forgetting of spatiotemporal dynamics in video prediction datasets with evolving domains.

Problem

Research questions and friction points this paper is trying to address.

Learning physical dynamics in non-stationary visual environments

Adapting to novel tasks without forgetting previous knowledge

Addressing value estimation challenges in continual visual RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Life-long world model with Gaussian mixture dynamics

Generative experience replay prevents catastrophic forgetting

Exploratory-conservative behavior learning for value estimation

🔎 Similar Papers

Ego-Foresight: Agent Visuomotor Prediction as Regularization for RL