🤖 AI Summary
Negative transfer induced by task transitions severely hinders continual reinforcement learning (CRL), yet its prevalence and mechanisms in CRL remain underexplored. Method: This work is the first to systematically characterize this phenomenon in CRL and proposes a “reset-and-distill” two-stage mechanism: (i) online policy and value network reset to mitigate interference from obsolete tasks, and (ii) offline cross-task knowledge distillation grounded in action probability distributions to preserve critical experience—both fully compatible with standard Actor-Critic frameworks and requiring no additional memory or parameter expansion. Contribution/Results: Evaluated on long-horizon continual task sequences in Meta-World, our approach achieves a 12.7% absolute improvement in average task success rate over state-of-the-art methods, demonstrating superior stability and adaptability. This work establishes a novel paradigm for modeling and mitigating negative transfer in CRL.
📝 Abstract
We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset&Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.