Efficient Partitioning Vision Transformer on Edge Devices for Distributed Inference

📅 2024-10-15

📈 Citations: 1

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the challenge of deploying Vision Transformers (ViTs) on resource-constrained edge devices, this paper proposes ED-ViT—the first framework enabling class-aware ViT model partitioning and collaborative inference across heterogeneous edge clusters. Methodologically, ED-ViT dynamically partitions the ViT backbone per input class, assigning lightweight, class-specific submodels, and further compresses them via fine-grained channel pruning. A distributed inference scheduling mechanism is designed to optimize cross-device computational load balancing. Extensive experiments across five benchmark datasets and three ViT architectures demonstrate that ED-ViT achieves up to 28.9× model size reduction, significantly lowers inference latency, and incurs negligible accuracy degradation (<0.5%). It consistently outperforms state-of-the-art edge ViT deployment approaches in efficiency–accuracy trade-offs.

Technology Category

Application Category

📝 Abstract

Deep learning models are increasingly utilized on resource-constrained edge devices for real-time data analytics. Recently, Vision Transformer and their variants have shown exceptional performance in various computer vision tasks. However, their substantial computational requirements and low inference latency create significant challenges for deploying such models on resource-constrained edge devices. To address this issue, we propose a novel framework, ED-ViT, which is designed to efficiently split and execute complex Vision Transformers across multiple edge devices. Our approach involves partitioning Vision Transformer models into several sub-models, while each dedicated to handling a specific subset of data classes. To further reduce computational overhead and inference latency, we introduce a class-wise pruning technique that decreases the size of each sub-model. Through extensive experiments conducted on five datasets using three model architectures and actual implementation on edge devices, we demonstrate that our method significantly cuts down inference latency on edge devices and achieves a reduction in model size by up to 28.9 times and 34.1 times, respectively, while maintaining test accuracy comparable to the original Vision Transformer. Additionally, we compare ED-ViT with two state-of-the-art methods that deploy CNN and SNN models on edge devices, evaluating metrics such as accuracy, inference time, and overall model size. Our comprehensive evaluation underscores the effectiveness of the proposed ED-ViT framework.

Problem

Research questions and friction points this paper is trying to address.

Efficiently partition Vision Transformers for edge devices

Reduce computational overhead and inference latency

Maintain accuracy while decreasing model size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitions Vision Transformer across edge devices

Uses class-wise pruning to reduce model size

Maintains accuracy while cutting latency significantly

🔎 Similar Papers

No similar papers found.