How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study investigates how last-layer weight resampling (“zapping”) and optimizer selection jointly govern learning-forgetting dynamics and task interference/synergy in continual and few-shot transfer learning. Using CNNs trained on handwritten character and natural image datasets, we conduct systematic experiments integrating task performance evaluation with parameter trajectory analysis. We find that zapping accelerates domain adaptation by up to 2.3×, but its efficacy critically depends on optimizer choice: Adam-family optimizers synergize with zapping to mitigate catastrophic forgetting and enhance cross-task knowledge reuse, whereas SGD often induces negative transfer. Crucially, we identify the optimizer–resampling coupling as a key mechanistic lever for balancing stability and plasticity in multi-task continual learning—a finding that establishes an interpretable, intervention-based pathway toward robust incremental learning.

Technology Category

Application Category

📝 Abstract

Recent work in continual learning has highlighted the beneficial effect of resampling weights in the last layer of a neural network (``zapping"). Although empirical results demonstrate the effectiveness of this approach, the underlying mechanisms that drive these improvements remain unclear. In this work, we investigate in detail the pattern of learning and forgetting that take place inside a convolutional neural network when trained in challenging settings such as continual learning and few-shot transfer learning, with handwritten characters and natural images. Our experiments show that models that have undergone zapping during training more quickly recover from the shock of transferring to a new domain. Furthermore, to better observe the effect of continual learning in a multi-task setting we measure how each individual task is affected. This shows that, not only zapping, but the choice of optimizer can also deeply affect the dynamics of learning and forgetting, causing complex patterns of synergy/interference between tasks to emerge when the model learns sequentially at transfer time.

Problem

Research questions and friction points this paper is trying to address.

Mechanisms behind weight resampling in continual learning

Impact of optimizers on learning and forgetting dynamics

Task synergy/interference in multi-task continual learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Resampling weights in last layer boosts learning

Optimizer choice impacts forgetting dynamics significantly

Zapping enhances recovery in domain transfer

🔎 Similar Papers

Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last