Prototype Augmented Hypernetworks for Continual Learning

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Catastrophic forgetting in continual learning arises when gradient updates for new tasks overwrite previously acquired knowledge. To address this, we propose Prototype-Augmented Hypernetworks (PAH), the first framework to integrate learnable task prototypes with a hypernetwork for sample-free, classification-head-free dynamic head generation. PAH introduces a dual-distillation loss: logits distillation preserves output consistency across tasks, while prototype alignment distillation stabilizes the shared feature space across tasks. The model is jointly optimized via cross-entropy and distillation losses. On Split-CIFAR100 and TinyImageNet, PAH achieves average accuracies of 74.5% and 63.7%, respectively, with forgetting rates of only 1.7% and 4.4%. These results significantly surpass those of existing replay-free methods, establishing a novel lightweight paradigm for continual learning.

Technology Category

Application Category

📝 Abstract

Continual learning (CL) aims to learn a sequence of tasks without forgetting prior knowledge, but gradient updates for a new task often overwrite the weights learned earlier, causing catastrophic forgetting (CF). We propose Prototype-Augmented Hypernetworks (PAH), a framework where a single hypernetwork, conditioned on learnable task prototypes, dynamically generates task-specific classifier heads on demand. To mitigate forgetting, PAH combines cross-entropy with dual distillation losses, one to align logits and another to align prototypes, ensuring stable feature representations across tasks. Evaluations on Split-CIFAR100 and TinyImageNet demonstrate that PAH achieves state-of-the-art performance, reaching 74.5 % and 63.7 % accuracy with only 1.7 % and 4.4 % forgetting, respectively, surpassing prior methods without storing samples or heads.

Problem

Research questions and friction points this paper is trying to address.

Prevent catastrophic forgetting in continual learning tasks

Dynamically generate task-specific classifier heads via hypernetworks

Achieve high accuracy with minimal forgetting across datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypernetwork generates task-specific heads dynamically

Dual distillation losses mitigate catastrophic forgetting

Task prototypes ensure stable feature representations

🔎 Similar Papers

Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers