🤖 AI Summary
This study addresses the gap in systematic research on parameter-efficient fine-tuning (PEFT) for resource-constrained small convolutional neural networks (CNNs) deployed on edge devices. Focusing on distribution shift and incremental learning of novel classes, it conducts the first comprehensive evaluation of prominent PEFT methods—including LoRA, DoRA, GaLore, and adapter-based approaches—on both standard and depthwise separable CNN architectures. Experimental results reveal that, within depthwise separable convolutions, LoRA, DoRA, and GaLore incur only half the memory overhead compared to their large-model counterparts; meanwhile, adapter variants reduce trainable FLOPs by up to 95%. Leveraging PyTorch performance profiling, the work uncovers principled adaptation patterns of PEFT in lightweight CNNs. These findings establish a deployable, low-overhead optimization paradigm for continual model updates on edge hardware.
📝 Abstract
Parameter-efficient fine-tuning (PEFT) methods reduce the computational costs of updating deep learning models by minimizing the number of additional parameters used to adapt a model to a down- stream task. While extensively researched in large language models (LLMs), their application to smaller models used on edge devices, such as convolutional neural networks, remains underexplored. This paper benchmarks and analyzes popular PEFT methods on convolutional architectures typically deployed in resource-constrained edge environments. We evaluate LoRA, DoRA, and GaLore for updating standard and depthwise convolutional architectures to handle distribution shifts and accommodate unseen classes. We utilize recently proposed PyTorch profilers to compare the updated model performance and computational costs of these PEFT methods with traditional fine-tuning approaches. With resource efficiency in mind, we investigate their update behavior across different rank dimensions. We find that the evaluated PEFT methods are only half as memory-efficient when applied to depthwise-separable convolution architectures, compared to their efficiency with LLMs. Conversely, when targeting convolu- tional architectures optimized for edge deployment, adapter-based PEFT methods can reduce floating point operations (FLOPs) during model updates by up to 95%. These insights offer valuable guidance for selecting PEFT methods based on hardware constraints, performance requirements, and application needs. Our code is online.