Efficient Partitioning Vision Transformer on Edge Devices for Distributed Inference

📅 2024-10-15
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of deploying Vision Transformers (ViTs) on resource-constrained edge devices, this paper proposes ED-ViT—the first framework enabling class-aware ViT model partitioning and collaborative inference across heterogeneous edge clusters. Methodologically, ED-ViT dynamically partitions the ViT backbone per input class, assigning lightweight, class-specific submodels, and further compresses them via fine-grained channel pruning. A distributed inference scheduling mechanism is designed to optimize cross-device computational load balancing. Extensive experiments across five benchmark datasets and three ViT architectures demonstrate that ED-ViT achieves up to 28.9× model size reduction, significantly lowers inference latency, and incurs negligible accuracy degradation (<0.5%). It consistently outperforms state-of-the-art edge ViT deployment approaches in efficiency–accuracy trade-offs.

Technology Category

Application Category

📝 Abstract
Deep learning models are increasingly utilized on resource-constrained edge devices for real-time data analytics. Recently, Vision Transformer and their variants have shown exceptional performance in various computer vision tasks. However, their substantial computational requirements and low inference latency create significant challenges for deploying such models on resource-constrained edge devices. To address this issue, we propose a novel framework, ED-ViT, which is designed to efficiently split and execute complex Vision Transformers across multiple edge devices. Our approach involves partitioning Vision Transformer models into several sub-models, while each dedicated to handling a specific subset of data classes. To further reduce computational overhead and inference latency, we introduce a class-wise pruning technique that decreases the size of each sub-model. Through extensive experiments conducted on five datasets using three model architectures and actual implementation on edge devices, we demonstrate that our method significantly cuts down inference latency on edge devices and achieves a reduction in model size by up to 28.9 times and 34.1 times, respectively, while maintaining test accuracy comparable to the original Vision Transformer. Additionally, we compare ED-ViT with two state-of-the-art methods that deploy CNN and SNN models on edge devices, evaluating metrics such as accuracy, inference time, and overall model size. Our comprehensive evaluation underscores the effectiveness of the proposed ED-ViT framework.
Problem

Research questions and friction points this paper is trying to address.

Efficiently partition Vision Transformers for edge devices
Reduce computational overhead and inference latency
Maintain accuracy while decreasing model size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitions Vision Transformer across edge devices
Uses class-wise pruning to reduce model size
Maintains accuracy while cutting latency significantly
🔎 Similar Papers
No similar papers found.
X
Xiang Liu
National University of Singapore, Singapore, Singapore
Y
Yijun Song
Zhejiang University of Finance & Economics, Hangzhou, Zhejiang, China
X
Xia Li
ETH Zürich, Zürich, Switzerland
Y
Yifei Sun
Zhejiang University, Hangzhou, Zhejiang, China
H
Huiying Lan
Lumia Ltd., Oxford, United Kingdom
Zemin Liu
Zemin Liu
Zhejiang University
Graph LearningGraph Imbalanced Learning
Linshan Jiang
Linshan Jiang
Research Fellow, Institute of Data Science (IDS), NUS
Privacy_preserving_Machine_learningCollaborative Machine LearningEdge-Cloud CollaborationWeb3
J
Jialin Li
National University of Singapore, Singapore, Singapore