Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

📅 2024-09-04
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the core challenge of balancing plasticity (adapting to new tasks) and stability (retaining prior knowledge) in offline reinforcement learning under task-streaming continual evolution. We propose the first diffusion-based decision-making framework integrated with experience replay, enabling joint optimization of conditional sequence modeling and knowledge retention. We introduce the first large-scale continual offline RL benchmark—comprising 90 diverse, multi-domain tasks—and design a replay strategy based on subsampling small historical datasets, coupled with a multi-task continual evaluation protocol. Experiments demonstrate that our method significantly outperforms existing diffusion-based and classical continual learning baselines across multiple tasks, achieving an average performance gain of 12.7% while attaining superior plasticity–stability trade-offs.

Technology Category

Application Category

📝 Abstract
Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks' datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge. In view of this, we propose a rehearsal-based continual diffusion model, called Continual Diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability). Specifically, we first construct an offline benchmark that contains 90 tasks from multiple domains. Then, we train the CoD on each task with sequential modeling and conditional generation for making decisions. Next, we preserve a small portion of previous datasets as the rehearsal buffer and replay it to retain the acquired knowledge. Extensive experiments on a series of tasks show CoD can achieve a promising plasticity-stability trade-off and outperform existing diffusion-based methods and other representative baselines on most tasks.
Problem

Research questions and friction points this paper is trying to address.

Continual Learning
Robotics
Memory Consolidation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous Diffuser (CoD)
Adaptability
Memory Consolidation
🔎 Similar Papers
No similar papers found.
J
Jifeng Hu
School of Artificial Intelligence, Jilin University, Changchun, China
L
Li Shen
Sun Yat-sen University, Zhongshan, China
Sili Huang
Sili Huang
JiLin University
Reinforcement learning
Z
Zhejian Yang
School of Artificial Intelligence, Jilin University, Changchun, China
Hechang Chen
Hechang Chen
School of Artificial Intelligence, Jilin University, China
Machine LearningData MiningDeep Reinforcement LearningComplex Network AnalysisKnowledge Graph
L
Lichao Sun
Lehigh University, Bethlehem, Pennsylvania, USA
Y
Yi Chang
School of Artificial Intelligence, Jilin University, Changchun, China
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining