From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the gap in systematic research on parameter-efficient fine-tuning (PEFT) for resource-constrained small convolutional neural networks (CNNs) deployed on edge devices. Focusing on distribution shift and incremental learning of novel classes, it conducts the first comprehensive evaluation of prominent PEFT methods—including LoRA, DoRA, GaLore, and adapter-based approaches—on both standard and depthwise separable CNN architectures. Experimental results reveal that, within depthwise separable convolutions, LoRA, DoRA, and GaLore incur only half the memory overhead compared to their large-model counterparts; meanwhile, adapter variants reduce trainable FLOPs by up to 95%. Leveraging PyTorch performance profiling, the work uncovers principled adaptation patterns of PEFT in lightweight CNNs. These findings establish a deployable, low-overhead optimization paradigm for continual model updates on edge hardware.

Technology Category

Application Category

📝 Abstract
Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.
Problem

Research questions and friction points this paper is trying to address.

Evaluating PEFT methods for edge device convolutional networks
Comparing memory and FLOP efficiency of PEFT techniques
Analyzing PEFT performance on resource-constrained edge architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient fine-tuning for edge devices
Benchmarking PEFT methods on convolutional architectures
Adapter-based PEFT reduces FLOPs by 95%
G
Georg Slamanig
Graz University of Technology, Austria
F
Francesco Corti
Graz University of Technology, Austria
Olga Saukh
Olga Saukh
TU Graz / CSH Vienna
embedded intelligencemachine learningdeep learningsensingedge AI