🤖 AI Summary
This work addresses the challenges of deploying deep learning on edge devices—particularly data scarcity, limited resources, and stringent latency requirements—by proposing a lightweight pretraining framework based on MobileViT for few-shot learning. It introduces, for the first time, the integration of knowledge distillation with MobileViT to effectively transfer the generalization capability of a large teacher model to a compact student model under label-scarce conditions. Evaluated on MiniImageNet, the approach improves 1-shot and 5-shot accuracy by 14% and 6.7%, respectively, over a ResNet12 baseline, while reducing model parameters by 69% and FLOPs by 88%. On a Jetson Orin Nano, it achieves a 37% reduction in dynamic energy consumption and an inference latency of only 2.6 ms, striking a compelling balance among energy efficiency, accuracy, and real-time performance.
📝 Abstract
Efficient and adaptable deep learning models are an important area of deep learning research, driven by the need for highly efficient models on edge devices. Few-shot learning enables the use of deep learning models in low-data regimes, a capability that is highly sought after in real-world applications where collecting large annotated datasets is costly or impractical. This challenge is particularly relevant in edge scenarios, where connectivity may be limited, low-latency responses are required, or energy consumption constraints are critical. We propose and evaluate a pre-training method for the MobileViT backbone designed for edge computing. Specifically, we employ knowledge distillation, which transfers the generalization ability of a large-scale teacher model to a lightweight student model. This method achieves accuracy improvements of 14% and 6.7% for one-shot and five-shot classification, respectively, on the MiniImageNet benchmark, compared to the ResNet12 baseline, while reducing by 69% the number of parameters and by 88% the computational complexity of the model, in FLOPs. Furthermore, we deployed the proposed models on a Jetson Orin Nano platform and measured power consumption directly at the power supply, showing that the dynamic energy consumption is reduced by 37% with a latency of 2.6 ms. These results demonstrate that the proposed method is a promising and practical solution for deploying few-shot learning models on edge AI hardware.