Federated Continual Instruction Tuning

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In federated learning (FL), continual instruction tuning of large language models faces critical challenges: clients dynamically receive new tasks, exhibit extreme data heterogeneity, and operate under strict memory constraints—leading to catastrophic forgetting. To address this, we propose the first FL-oriented continual instruction tuning framework. Our method introduces (1) a dynamic knowledge organization mechanism that enables incremental cross-task knowledge integration, and (2) subspace-selective activation, which achieves multi-task parameter isolation to preserve task-specific inference capabilities. Evaluated across 12 instruction-following datasets under four heterogeneous FL settings, our framework significantly mitigates forgetting and achieves state-of-the-art performance across all major metrics. To foster reproducibility and community advancement, we will publicly release both the source code and a comprehensive benchmark suite.

Technology Category

Application Category

📝 Abstract
A vast amount of instruction tuning data is crucial for the impressive performance of Large Multimodal Models (LMMs), but the associated computational costs and data collection demands during supervised fine-tuning make it impractical for most researchers. Federated learning (FL) has the potential to leverage all distributed data and training resources to reduce the overhead of joint training. However, most existing methods assume a fixed number of tasks, while in real-world scenarios, clients continuously encounter new knowledge and often struggle to retain old tasks due to memory constraints. In this work, we introduce the Federated Continual Instruction Tuning (FCIT) benchmark to model this real-world challenge. Our benchmark includes two realistic scenarios, encompassing four different settings and twelve carefully curated instruction tuning datasets. To address the challenges posed by FCIT, we propose dynamic knowledge organization to effectively integrate updates from different tasks during training and subspace selective activation to allocate task-specific output during inference. Extensive experimental results demonstrate that our proposed method significantly enhances model performance across varying levels of data heterogeneity and catastrophic forgetting. Our source code and dataset will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Addresses high computational costs in instruction tuning for LMMs.
Solves catastrophic forgetting in federated continual learning scenarios.
Proposes methods to manage data heterogeneity in distributed training.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning for distributed data utilization
Dynamic knowledge organization for task integration
Subspace selective activation for task-specific outputs
🔎 Similar Papers
No similar papers found.