Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting in federated fine-tuning (FedFT) under continual learning—especially in highly heterogeneous distributed settings with non-IID data and divergent device capabilities—this paper proposes FedBE, an adaptive Transformer block expansion framework. Methodologically, FedBE introduces a scalable and allocatable dynamic Transformer block mechanism that structurally isolates newly acquired knowledge from previously learned representations; it dynamically assigns trainable modules per client based on local data distribution and computational resource constraints. This design overcomes the fundamental trade-off in parameter-efficient federated fine-tuning (PEFT) between forgetting mitigation and generalization capability. Extensive experiments demonstrate that FedBE improves accuracy by 12–74% on general tasks, accelerates convergence by 1.9–3.1×, and preserves downstream task performance without degradation.

Technology Category

Application Category

📝 Abstract
Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments while ensuring data privacy. Existing FedFT methods predominantly utilize parameter-efficient fine-tuning (PEFT) techniques to reduce communication and computation overhead. However, they often fail to adequately address the catastrophic forgetting, a critical challenge arising from continual adaptation in distributed environments. The traditional centralized fine-tuning methods, which are not designed for the heterogeneous and privacy-constrained nature of federated environments, struggle to mitigate this issue effectively. Moreover, the challenge is further exacerbated by significant variation in data distributions and device capabilities across clients, which leads to intensified forgetting and degraded model generalization. To tackle these issues, we propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy. Specifically, FedBE expands trainable blocks within the model architecture, structurally separating newly learned task-specific knowledge from the original pre-trained representations. Additionally, FedBE dynamically assigns these trainable blocks to clients based on their data distributions and computational capabilities. This enables the framework to better accommodate heterogeneous federated environments and enhances the generalization ability of the model.Extensive experiments show that compared with existing federated fine-tuning methods, FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Address catastrophic forgetting in federated fine-tuning of LLMs
Handle heterogeneous data distributions in federated environments
Improve model generalization and accuracy retention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive transformer block expansion mechanism
Dynamic trainable-block allocation strategy
Structural separation of task-specific knowledge
🔎 Similar Papers
No similar papers found.
Y
Yujia Huo
School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China, 230027, and also with Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, China, 215123
Jianchun Liu
Jianchun Liu
University of Science and Technology of China
Edge ComputingFederated LearningModel Inference
Hongli Xu
Hongli Xu
University of Science and Technology of China
Software Defined NetworkCooperative CommunicationSensor Networks
Zhenguo Ma
Zhenguo Ma
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu, China, 221116, and also with Mine Digitization Engineering Research Center of the Ministry of Education
S
Shilong Wang
School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China, 230027, and also with Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, China, 215123
Liusheng Huang
Liusheng Huang
Professor of Computer Science, University of Science and Technology of China
无线网络、信息安全