Routing without Forgetting

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional parameter-efficient methods—such as prompt tuning and adapters—in online continual learning (OCL), where reliance on multi-step gradient optimization hinders performance under single-pass data streams. The authors reformulate continual learning in Transformers as a dynamic routing problem and, for the first time, introduce an energy-driven associative retrieval mechanism grounded in modern Hopfield networks. Operating without task identifiers and under the constraint of single-epoch observation, this approach generates input-adaptive prompts in a feedforward manner by minimizing a convex free energy functional, thereby eliminating the need for iterative optimization. Evaluated on class-incremental benchmarks including Split-ImageNet-R, the method significantly outperforms existing prompt-based techniques, demonstrating particularly strong performance in few-shot online continual learning scenarios.

Technology Category

Application Category

📝 Abstract
Continual learning in transformers is commonly addressed through parameter-efficient adaptation: prompts, adapters, or LoRA modules are specialized per task while the backbone remains frozen. Although effective in controlled multi-epoch settings, these approaches rely on gradual gradient-based specialization and struggle in Online Continual Learning (OCL), where data arrive as a non-stationary stream and each sample may be observed only once. We recast continual learning in transformers as a routing problem: under strict online constraints, the model must dynamically select the appropriate representational subspace for each input without explicit task identifiers or repeated optimization. We thus introduce Routing without Forgetting (RwF), a transformer architecture augmented with energy-based associative retrieval layers inspired by Modern Hopfield Networks. Instead of storing or merging task-specific prompts, RwF generates dynamic prompts through single-step associative retrieval over the transformer token embeddings at each layer. Retrieval corresponds to the closed-form minimization of a strictly convex free-energy functional, enabling input-conditioned routing within each forward pass, independently of iterative gradient refinement. Across challenging class-incremental benchmarks, RwF improves over existing prompt-based methods. On Split-ImageNet-R and Split-ImageNet-S, RwF outperforms prior prompt-based approaches by a large margin, even in few-shot learning regimes. These results indicate that embedding energy-based associative routing directly within the transformer backbone provides a principled and effective foundation for OCL.
Problem

Research questions and friction points this paper is trying to address.

Online Continual Learning
Transformer
Prompt-based Methods
Non-stationary Data Stream
Task-agnostic Routing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Continual Learning
Energy-based Associative Retrieval
Dynamic Prompting
Modern Hopfield Networks
Transformer Routing
🔎 Similar Papers
No similar papers found.
A
Alessio Masano
University of Catania
G
Giovanni Bellitto
University of Catania
D
Dipam Goswani
Department of Computer Science, Universitat Autònoma de Barcelona; Computer Vision Center, Barcelona
J
Joost Van de Weijer
Department of Computer Science, Universitat Autònoma de Barcelona; Computer Vision Center, Barcelona
Concetto Spampinato
Concetto Spampinato
University of Catania
Deep LearningArtificial IntelligenceComputer VisionMedical Image Analysis