Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of deploying deep learning on edge devices—particularly data scarcity, limited resources, and stringent latency requirements—by proposing a lightweight pretraining framework based on MobileViT for few-shot learning. It introduces, for the first time, the integration of knowledge distillation with MobileViT to effectively transfer the generalization capability of a large teacher model to a compact student model under label-scarce conditions. Evaluated on MiniImageNet, the approach improves 1-shot and 5-shot accuracy by 14% and 6.7%, respectively, over a ResNet12 baseline, while reducing model parameters by 69% and FLOPs by 88%. On a Jetson Orin Nano, it achieves a 37% reduction in dynamic energy consumption and an inference latency of only 2.6 ms, striking a compelling balance among energy efficiency, accuracy, and real-time performance.
📝 Abstract
Efficient and adaptable deep learning models are an important area of deep learning research, driven by the need for highly efficient models on edge devices. Few-shot learning enables the use of deep learning models in low-data regimes, a capability that is highly sought after in real-world applications where collecting large annotated datasets is costly or impractical. This challenge is particularly relevant in edge scenarios, where connectivity may be limited, low-latency responses are required, or energy consumption constraints are critical. We propose and evaluate a pre-training method for the MobileViT backbone designed for edge computing. Specifically, we employ knowledge distillation, which transfers the generalization ability of a large-scale teacher model to a lightweight student model. This method achieves accuracy improvements of 14% and 6.7% for one-shot and five-shot classification, respectively, on the MiniImageNet benchmark, compared to the ResNet12 baseline, while reducing by 69% the number of parameters and by 88% the computational complexity of the model, in FLOPs. Furthermore, we deployed the proposed models on a Jetson Orin Nano platform and measured power consumption directly at the power supply, showing that the dynamic energy consumption is reduced by 37% with a latency of 2.6 ms. These results demonstrate that the proposed method is a promising and practical solution for deploying few-shot learning models on edge AI hardware.
Problem

Research questions and friction points this paper is trying to address.

Few-Shot Learning
Edge AI
Model Efficiency
Low-Data Regimes
Energy Constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Learning
Knowledge Distillation
MobileViT
Edge AI
Model Efficiency
🔎 Similar Papers
No similar papers found.
S
Shuhei Tsuyuki
Research Institute of Electrical Communication, Tohoku University, Japan; Graduate School of Engineering, Tohoku University, Japan
R
Reda Bensaid
IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238 Brest, France
J
Jérémy Morlier
IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238 Brest, France
M
Mathieu Léonardon
IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238 Brest, France
N
Naoya Onizawa
Research Institute of Electrical Communication, Tohoku University, Japan
Vincent Gripon
Vincent Gripon
IMT Atlantique and Lab-STICC
Deep LearningFew-Shot LearningArtificial Intelligence
T
Takahiro Hanyu
Research Institute of Electrical Communication, Tohoku University, Japan