From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the gap in systematic research on parameter-efficient fine-tuning (PEFT) for resource-constrained small convolutional neural networks (CNNs) deployed on edge devices. Focusing on distribution shift and incremental learning of novel classes, it conducts the first comprehensive evaluation of prominent PEFT methods—including LoRA, DoRA, GaLore, and adapter-based approaches—on both standard and depthwise separable CNN architectures. Experimental results reveal that, within depthwise separable convolutions, LoRA, DoRA, and GaLore incur only half the memory overhead compared to their large-model counterparts; meanwhile, adapter variants reduce trainable FLOPs by up to 95%. Leveraging PyTorch performance profiling, the work uncovers principled adaptation patterns of PEFT in lightweight CNNs. These findings establish a deployable, low-overhead optimization paradigm for continual model updates on edge hardware.

Technology Category

Application Category

📝 Abstract

Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.

Problem

Research questions and friction points this paper is trying to address.

Evaluating PEFT methods for edge device convolutional networks

Comparing memory and FLOP efficiency of PEFT techniques

Analyzing PEFT performance on resource-constrained edge architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient fine-tuning for edge devices

Benchmarking PEFT methods on convolutional architectures

Adapter-based PEFT reduces FLOPs by 95%

🔎 Similar Papers

Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines

2024-09-23Citations: 1

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

2024-08-20arXiv.orgCitations: 7

Netflix

$466,000.00 - $750,000.00

Los Gatos,California,United States of America / Los Angeles,California,United States of America

Authors to Follow