🤖 AI Summary
This work challenges the necessity of neuron specialization for mitigating catastrophic forgetting in continual learning, revealing that specialization is primarily governed by network initialization rather than intrinsic task properties.
Method: Through theoretical analysis and empirical validation, the authors demonstrate that weight imbalance and high weight entropy actively induce localized representations; they provide the first theoretical proof that specialization is not inherent but contingent. They further derive a quantitative relationship between specialization degree and initialization parameters, and reproduce the monotonic relationship between task similarity and forgetting rate even in non-specialized networks.
Contribution/Results: Specialized initialization significantly enhances Elastic Weight Consolidation (EWC) performance, an effect attributable to initialization-induced prior shaping of representation structure. These findings establish a novel theoretical foundation for regularization design in continual learning and yield principled guidelines for initialization strategy selection.
📝 Abstract
Prior work has demonstrated a consistent tendency in neural networks engaged in continual learning tasks, wherein intermediate task similarity results in the highest levels of catastrophic interference. This phenomenon is attributed to the network's tendency to reuse learned features across tasks. However, this explanation heavily relies on the premise that neuron specialisation occurs, i.e. the emergence of localised representations. Our investigation challenges the validity of this assumption. Using theoretical frameworks for the analysis of neural networks, we show a strong dependence of specialisation on the initial condition. More precisely, we show that weight imbalance and high weight entropy can favour specialised solutions. We then apply these insights in the context of continual learning, first showing the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks. {Finally, we show that specialization by weight imbalance is beneficial on the commonly employed elastic weight consolidation regularisation technique.