🤖 AI Summary
In reinforcement learning, neural networks often suffer from catastrophic forgetting—loss of plasticity during training—which impairs continual learning. While existing parameter reset methods can restore plasticity, they induce severe performance degradation, hindering real-world deployment. To address this, we propose AltNet, a novel dual-network alternating framework: an active network performs online policy learning, while a passive network conducts offline policy learning and periodically takes over control, enabling implicit, zero-cost plasticity reset. This design fully decouples plasticity recovery from performance preservation, resolving the plasticity–stability dilemma without compromising operational robustness. Evaluated on multiple high-dimensional continuous-control tasks in the DeepMind Control Suite, AltNet achieves superior sample efficiency and final performance compared to state-of-the-art baselines and advanced reset methods, while eliminating performance fluctuations entirely.
📝 Abstract
Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks. The use of twin networks anchors performance during resets through a mechanism that allows networks to periodically alternate roles: one network learns as it acts in the environment, while the other learns off-policy from the active network's interactions and a replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. AltNet restores plasticity, improving sample efficiency and achieving higher performance, while avoiding performance drops that pose risks in safety-critical settings. We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite, where AltNet outperforms various relevant baseline methods, as well as state-of-the-art reset-based techniques.