Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

📅 2024-03-08

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Negative transfer induced by task transitions severely hinders continual reinforcement learning (CRL), yet its prevalence and mechanisms in CRL remain underexplored. Method: This work is the first to systematically characterize this phenomenon in CRL and proposes a “reset-and-distill” two-stage mechanism: (i) online policy and value network reset to mitigate interference from obsolete tasks, and (ii) offline cross-task knowledge distillation grounded in action probability distributions to preserve critical experience—both fully compatible with standard Actor-Critic frameworks and requiring no additional memory or parameter expansion. Contribution/Results: Evaluated on long-horizon continual task sequences in Meta-World, our approach achieves a 12.7% absolute improvement in average task success rate over state-of-the-art methods, demonstrating superior stability and adaptability. This work establishes a novel paradigm for modeling and mitigating negative transfer in CRL.

Technology Category

Application Category

📝 Abstract

We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset&Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.

Problem

Research questions and friction points this paper is trying to address.

Overcoming negative transfer when new tasks arrive in continual reinforcement learning

Addressing performance degradation in CRL algorithms during task transitions

Mitigating detrimental effects of knowledge interference in sequential task learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Resets actor and critic networks for new tasks

Distills knowledge from online actor and experts

Combines online learning with offline distillation steps

🔎 Similar Papers

No similar papers found.