🤖 AI Summary
In multi-task reinforcement learning, representational plasticity degrades during training—manifesting as neuronal dormancy and representational collapse—thereby impairing adaptation to novel tasks. To address this, we propose a dynamic sparse training paradigm that integrates gradual magnitude pruning (GMP) with sparse evolutionary training (SET), systematically evaluating the regulatory effect of sparsity on plasticity across shared backbone, Mixture-of-Experts (MoE), and orthogonal MoE architectures. Experiments demonstrate that sparse proxies substantially alleviate representational rigidity: they outperform dense baselines on multiple benchmarks, improve plasticity metrics by up to 23.6%, and match the multi-task performance of explicit plasticity-intervention methods. Crucially, this work establishes, for the first time, an intrinsic link between sparse network structure and neural plasticity, yielding a new architectural design principle for efficient and robust multi-task representation learning.
📝 Abstract
Plasticity loss, a diminishing capacity to adapt as training progresses, is a critical challenge in deep reinforcement learning. We examine this issue in multi-task reinforcement learning (MTRL), where higher representational flexibility is crucial for managing diverse and potentially conflicting task demands. We systematically explore how sparsification methods, particularly Gradual Magnitude Pruning (GMP) and Sparse Evolutionary Training (SET), enhance plasticity and consequently improve performance in MTRL agents. We evaluate these approaches across distinct MTRL architectures (shared backbone, Mixture of Experts, Mixture of Orthogonal Experts) on standardized MTRL benchmarks, comparing against dense baselines, and a comprehensive range of alternative plasticity-inducing or regularization methods. Our results demonstrate that both GMP and SET effectively mitigate key indicators of plasticity degradation, such as neuron dormancy and representational collapse. These plasticity improvements often correlate with enhanced multi-task performance, with sparse agents frequently outperforming dense counterparts and achieving competitive results against explicit plasticity interventions. Our findings offer insights into the interplay between plasticity, network sparsity, and MTRL designs, highlighting dynamic sparsification as a robust but context-sensitive tool for developing more adaptable MTRL systems.