A Theory of Initialisation's Impact on Specialisation

📅 2025-03-04
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the necessity of neuron specialization for mitigating catastrophic forgetting in continual learning, revealing that specialization is primarily governed by network initialization rather than intrinsic task properties. Method: Through theoretical analysis and empirical validation, the authors demonstrate that weight imbalance and high weight entropy actively induce localized representations; they provide the first theoretical proof that specialization is not inherent but contingent. They further derive a quantitative relationship between specialization degree and initialization parameters, and reproduce the monotonic relationship between task similarity and forgetting rate even in non-specialized networks. Contribution/Results: Specialized initialization significantly enhances Elastic Weight Consolidation (EWC) performance, an effect attributable to initialization-induced prior shaping of representation structure. These findings establish a novel theoretical foundation for regularization design in continual learning and yield principled guidelines for initialization strategy selection.

Technology Category

Application Category

📝 Abstract
Prior work has demonstrated a consistent tendency in neural networks engaged in continual learning tasks, wherein intermediate task similarity results in the highest levels of catastrophic interference. This phenomenon is attributed to the network's tendency to reuse learned features across tasks. However, this explanation heavily relies on the premise that neuron specialisation occurs, i.e. the emergence of localised representations. Our investigation challenges the validity of this assumption. Using theoretical frameworks for the analysis of neural networks, we show a strong dependence of specialisation on the initial condition. More precisely, we show that weight imbalance and high weight entropy can favour specialised solutions. We then apply these insights in the context of continual learning, first showing the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks. {Finally, we show that specialization by weight imbalance is beneficial on the commonly employed elastic weight consolidation regularisation technique.
Problem

Research questions and friction points this paper is trying to address.

Examines impact of initialization on neural network specialization.
Challenges assumption of neuron specialization in continual learning.
Explores weight imbalance and entropy effects on task-similarity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes neural network initialization impact on specialization
Links weight imbalance and entropy to specialized solutions
Enhances elastic weight consolidation via weight imbalance
🔎 Similar Papers
No similar papers found.
Devon Jarvis
Devon Jarvis
University of the Witwatersrand
Deep Learning TheoryComputational Neuroscience
Sebastian Lee
Sebastian Lee
Flatiron Institute
Machine Learning
C
C. Domin'e
Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre, UCL
Andrew Saxe
Andrew Saxe
Professor, Gatsby Unit & Sainsbury Wellcome Centre, UCL
Theoretical NeuroscienceMachine LearningPsychology
S
Stefano Sarao Mannelli
Data Science and AI, Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg