π€ AI Summary
In federated continual learning (FCL), foundational models suffer from weak task-specific adaptability and severe catastrophic forgetting due to inaccessibility of private local data. To address this, we propose the first largeβsmall model collaborative framework, wherein lightweight, heterogeneous local small models dynamically bridge a global foundational model and evolving private task streams. Our approach innovatively integrates continual fine-tuning, one-to-one knowledge distillation, and federated learning to enable personalized small-model training and cross-client knowledge aggregation. Under strict data isolation and communication constraints, the framework significantly mitigates forgetting while improving both forward and backward transfer performance across tasks. Crucially, it exhibits strong robustness to structural heterogeneity among local small models. Extensive experiments on multiple FCL benchmarks demonstrate consistent superiority over state-of-the-art methods.
π Abstract
Continual learning (CL) for Foundation Models (FMs) is an essential yet underexplored challenge, especially in Federated Continual Learning (FCL), where each client learns from a private, evolving task stream under strict data and communication constraints. Despite their powerful generalization abilities, FMs often exhibit suboptimal performance on local downstream tasks, as they are unable to utilize private local data. Furthermore, enabling FMs to learn new tasks without forgetting prior knowledge is inherently a challenging problem, primarily due to their immense parameter count and high model complexity. In contrast, small models can be trained locally under resource-constrained conditions and benefit from more mature CL techniques. To bridge the gap between small models and FMs, we propose the first collaborative framework in FCL, where lightweight local models act as a dynamic bridge, continually adapting to new tasks while enhancing the utility of the large model. Two novel components are also included: Small Model Continual Fine-tuning is for preventing small models from temporal forgetting; One-by-One Distillation performs personalized fusion of heterogeneous local knowledge on the server. Experimental results demonstrate its superior performance, even when clients utilize heterogeneous small models.