🤖 AI Summary
To address slow convergence and poor robustness in federated fine-tuning of language models under non-IID data, this paper proposes a targeted layer update mechanism: it introduces layer importance scoring—based on gradient sensitivity and task consistency—to dynamically identify critical network layers; only their parameters are updated while others are frozen, enabling structured sparse updates and suppressing malicious or noisy client updates. Integrated into an enhanced FedAvg framework, the method incurs no additional communication overhead. Evaluated across multiple non-IID text classification tasks, it accelerates convergence by 23–41% and improves final accuracy by 2.8–5.3 percentage points on average, significantly outperforming existing baselines. The core contribution lies in the first integration of hierarchical importance modeling with federated fine-tuning, jointly optimizing efficiency, accuracy, and robustness.
📝 Abstract
Federated learning (FL) addresses privacy concerns in training language models by enabling multiple clients to contribute to the training, without sending their data to others. However, non-IID (identically and independently distributed) data across clients often limits FL's performance. This issue is especially challenging during model fine-tuning, as noise due to variations in clients' data distributions can harm model convergence near stationary points. This paper proposes a targeted layer update strategy for fine-tuning in FL. Instead of randomly updating layers of the language model, as often done in practice, we use a scoring mechanism to identify and update the most critical layers, avoiding excessively noisy or even poisoned updates by freezing the parameters in other layers. We show in extensive experiments that our method improves convergence and performance in non-IID settings, offering a more efficient approach to fine-tuning federated language models.