🤖 AI Summary
In federated learning (FL), continual instruction tuning of large language models faces critical challenges: clients dynamically receive new tasks, exhibit extreme data heterogeneity, and operate under strict memory constraints—leading to catastrophic forgetting. To address this, we propose the first FL-oriented continual instruction tuning framework. Our method introduces (1) a dynamic knowledge organization mechanism that enables incremental cross-task knowledge integration, and (2) subspace-selective activation, which achieves multi-task parameter isolation to preserve task-specific inference capabilities. Evaluated across 12 instruction-following datasets under four heterogeneous FL settings, our framework significantly mitigates forgetting and achieves state-of-the-art performance across all major metrics. To foster reproducibility and community advancement, we will publicly release both the source code and a comprehensive benchmark suite.
📝 Abstract
A vast amount of instruction tuning data is crucial for the impressive performance of Large Multimodal Models (LMMs), but the associated computational costs and data collection demands during supervised fine-tuning make it impractical for most researchers. Federated learning (FL) has the potential to leverage all distributed data and training resources to reduce the overhead of joint training. However, most existing methods assume a fixed number of tasks, while in real-world scenarios, clients continuously encounter new knowledge and often struggle to retain old tasks due to memory constraints. In this work, we introduce the Federated Continual Instruction Tuning (FCIT) benchmark to model this real-world challenge. Our benchmark includes two realistic scenarios, encompassing four different settings and twelve carefully curated instruction tuning datasets. To address the challenges posed by FCIT, we propose dynamic knowledge organization to effectively integrate updates from different tasks during training and subspace selective activation to allocate task-specific output during inference. Extensive experimental results demonstrate that our proposed method significantly enhances model performance across varying levels of data heterogeneity and catastrophic forgetting. Our source code and dataset will be made publicly available.