Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the challenges of deploying deep learning on edge devices—particularly data scarcity, limited resources, and stringent latency requirements—by proposing a lightweight pretraining framework based on MobileViT for few-shot learning. It introduces, for the first time, the integration of knowledge distillation with MobileViT to effectively transfer the generalization capability of a large teacher model to a compact student model under label-scarce conditions. Evaluated on MiniImageNet, the approach improves 1-shot and 5-shot accuracy by 14% and 6.7%, respectively, over a ResNet12 baseline, while reducing model parameters by 69% and FLOPs by 88%. On a Jetson Orin Nano, it achieves a 37% reduction in dynamic energy consumption and an inference latency of only 2.6 ms, striking a compelling balance among energy efficiency, accuracy, and real-time performance.

Technology Category

Application Category

📝 Abstract

Efficient and adaptable deep learning models are an important area of deep learning research, driven by the need for highly efficient models on edge devices. Few-shot learning enables the use of deep learning models in low-data regimes, a capability that is highly sought after in real-world applications where collecting large annotated datasets is costly or impractical. This challenge is particularly relevant in edge scenarios, where connectivity may be limited, low-latency responses are required, or energy consumption constraints are critical. We propose and evaluate a pre-training method for the MobileViT backbone designed for edge computing. Specifically, we employ knowledge distillation, which transfers the generalization ability of a large-scale teacher model to a lightweight student model. This method achieves accuracy improvements of 14% and 6.7% for one-shot and five-shot classification, respectively, on the MiniImageNet benchmark, compared to the ResNet12 baseline, while reducing by 69% the number of parameters and by 88% the computational complexity of the model, in FLOPs. Furthermore, we deployed the proposed models on a Jetson Orin Nano platform and measured power consumption directly at the power supply, showing that the dynamic energy consumption is reduced by 37% with a latency of 2.6 ms. These results demonstrate that the proposed method is a promising and practical solution for deploying few-shot learning models on edge AI hardware.

Problem

Research questions and friction points this paper is trying to address.

Few-Shot Learning

Edge AI

Model Efficiency

Low-Data Regimes

Energy Constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Learning

Knowledge Distillation

MobileViT