Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices

πŸ“… 2024-11-12
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address excessive memory and computational overhead when fine-tuning large Transformer models via federated learning (FL) on resource-constrained edge devices, this paper proposes a lightweight hierarchical fine-tuning framework. The method introduces: (1) a novel dynamic layer selection strategy tailored for heterogeneous devices, which adaptively activates optimal network layers based on device-specific compute and memory constraints; and (2) a synergistic integration of parameter-efficient fine-tuning (PEFT), hierarchical freezing, and device-aware activation, enabling fine-grained resource adaptation on a Tiny Transformer architecture. Experiments demonstrate stable training under stringent memory and FLOPs constraints, communication overhead comparable to LoRA, and significant reductions in both FLOPs and GPU memory consumption. Moreover, the approach achieves superior accuracy over existing state-of-the-art methods in cross-device FL settings.

Technology Category

Application Category

πŸ“ Abstract
In recent years, Large Language Models (LLMs) through Transformer structures have dominated many machine learning tasks, especially text processing. However, these models require massive amounts of data for training and induce high resource requirements, particularly in terms of the large number of Floating Point Operations (FLOPs) and the high amounts of memory needed. To fine-tune such a model in a parameter-efficient way, techniques like Adapter or LoRA have been developed. However, we observe that the application of LoRA, when used in federated learning (FL), while still being parameter-efficient, is memory and FLOP inefficient. Based on that observation, we develop a novel layer finetuning scheme that allows devices in cross-device FL to make use of pretrained neural networks (NNs) while adhering to given resource constraints. We show that our presented scheme outperforms the current state of the art when dealing with homogeneous or heterogeneous computation and memory constraints and is on par with LoRA regarding limited communication, thereby achieving significantly higher accuracies in FL training.
Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning of tiny Transformers on resource-limited devices
Addressing memory and FLOP inefficiency in federated LoRA adaptation
Optimizing federated learning under heterogeneous computation constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel layer finetuning for resource-constrained FL
Efficient use of pretrained NNs in FL
Outperforms LoRA in memory and FLOP efficiency
πŸ”Ž Similar Papers
No similar papers found.
K
Kilian Pfeiffer
Karlsruhe Institute of Technology, Karlsruhe, Germany
M
Mohamed Aboelenien Ahmed
Karlsruhe Institute of Technology, Karlsruhe, Germany
R
R. Khalili
Huawei Research Center, Munich, Germany
JΓΆrg Henkel
JΓΆrg Henkel
Professor of Computer Science, Karlsruhe Institute of Technology
Embedded SystemsSystems-on-ChipDependable SystemsLow Power DesignThermal Design