Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection

📅 2025-05-31
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Resource-constrained devices and data heterogeneity in federated learning (FL) hinder the practical deployment of parameter-efficient fine-tuning (PEFT). Method: This paper proposes a lightweight PEFT framework tailored for multi-head attention language models, featuring: (1) an attention-head confidence-based importance scoring mechanism; (2) head-level sparse pruning; (3) head-specific weighted aggregation; and (4) importance-driven collaborative client selection. Results: Evaluated on multi-task benchmarks including T5-small and MultiNLI, the framework achieves 90% head sparsity, reducing communication overhead to 55.6% (1/1.8) of the baseline, cutting training computational cost by 3.9×, and incurring ≤2% accuracy degradation. These improvements substantially enhance the practicality and scalability of PEFT in FL settings.

Technology Category

Application Category

📝 Abstract
Parameter Efficient Fine-Tuning (PEFT) has become the de-facto approach in adapting Large Language Models (LLMs) for downstream tasks in Natural Language Processing. However, its adoption in privacy-preserving distributed learning frameworks, such as Federated Learning (FL), remains relatively limited. This is mainly due to challenges specific to FL, such as resource-constrained devices and diverse data distributions among clients. In this paper, we propose an efficient method to perform PEFT within the FL framework for Multi-Head Attention (MHA) based language models. We address the challenges through head pruning, a novel head-specific weighted aggregation mechanism, and a client selection strategy. Head pruning minimizes training complexity within the clients, guided by the importance score computed based on the confidence of the attention head. Weighted aggregation of heads ensures the global model captures crucial updates from diverse clients complementing our client selection strategy. We show results on the MultiNLI benchmark along with 20 Newsgroups, XL-Sum, and E2E NLG datasets. We use the MultiNLI dataset and T5-small model with LoRA as our PEFT method, attaining sparsity levels of up to 90%, resulting in a communication advantage of up to 1.8x and a reduction in training OPs of 3.9x while maintaining the accuracy drop under 2%.
Problem

Research questions and friction points this paper is trying to address.

Efficient PEFT in Federated Learning for MHA models
Addressing FL challenges via head pruning and client selection
Maintaining accuracy while reducing communication and training costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Head pruning reduces training complexity.
Weighted aggregation captures diverse client updates.
Strategic client selection enhances efficiency.
🔎 Similar Papers
No similar papers found.