Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses a critical limitation in existing continual reinforcement learning approaches, which predominantly focus on policy networks while overlooking the role of value functions in mitigating catastrophic forgetting. Targeting multi-episode continual learning scenarios, the paper introduces Qreg+NWLU—a novel method that, for the first time, effectively integrates experience replay into the value function approximation framework of deep Q-networks. The approach combines Q-value regularization-based continual replay with a “no-wait” regularization mechanism, dynamically updating the Q-values of stored experiences and applying constraints immediately. Experimental results demonstrate that Qreg+NWLU significantly outperforms both Qreg and current state-of-the-art continual reinforcement learning methods in terms of forgetting suppression, learning efficiency, and cross-task knowledge transfer.

📝 Abstract

Data rehearsal has emerged as a leading approach for mitigating catastrophic forgetting in Continual Reinforcement Learning (CRL). However, existing work remains confined to policy gradient frameworks, regularizing only actors due to the performance degradation incurred by critic regularization. This actor-centric approach overlooks the potential of data rehearsal for value function approximation. Moreover, existing evaluations in CRL rarely consider multi-cyclic environments where task sequences repeat, a critical real-world scenario that exacerbates forgetting and plasticity. We investigate data rehearsal for Deep Q-Networks using Q-value regularization in multi-cyclic settings and propose Qreg+NWLU which introduces two simple modifications: (1) continuous data rehearsal that dynamically collects and updates stored Q-values throughout training, and (2) "No-Wait" regularization that applies immediately rather than after the first task. Together, these modifications yield improvements in learning efficiency, forgetting mitigation, and knowledge transfer over Qreg and conventional CRL methods within value function approximation settings.

Problem

Research questions and friction points this paper is trying to address.

Continual Reinforcement Learning

Catastrophic Forgetting

Data Rehearsal

Value Function Approximation

Multi-cyclic Environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

data rehearsal

value-based reinforcement learning

catastrophic forgetting