Adaptive Swin Transformer Partitioning over AI-RAN Networks

📅 2026-04-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the challenges of deploying Transformer-based real-time video object detection in dynamic 5G AI-RAN networks, where large intermediate activation volumes incur high communication overhead and unstable latency. The study introduces, for the first time, a throughput-aware adaptive model splitting strategy tailored to Transformer architectures, enabling dynamic selection of optimal split points without retraining. It further proposes an accuracy-preserving activation compression technique, deeply co-designed with distributed User Plane Functions (dUPFs) to enable end-to-end inference optimization. Experiments on the NVIDIA Aerial AI-RAN platform demonstrate that the proposed approach significantly reduces uplink traffic and user-plane latency while enhancing system stability. The work also quantifies practical trade-offs among latency, energy consumption, and privacy in real-world deployment scenarios.

Technology Category

Application Category

📝 Abstract
This paper demonstrates the feasibility of transformer-based split inference for real-time video object detection over dynamic 5G AI-RAN networks. We extend throughput-aware adaptive splitting from CNNs to a Swin Transformer backbone and show that practical split execution is achievable for transformer-based vision models without retraining. To address the large intermediate activations inherent to transformers, we introduce an efficient, accuracy-preserving activation compression pipeline that substantially reduces uplink payload. The complete system -- including adaptive split selection, transformer inference, and compression -- is implemented and validated end-to-end on a real-time detection workload, with distributed UPF (dUPF) integration further reducing user-plane latency and improving runtime stability. Extensive measurements on an NVIDIA Aerial-based AI-RAN testbed jointly account for inference and 5G communication energy, quantifying the latency-energy-privacy trade-offs in realistic deployments.
Problem

Research questions and friction points this paper is trying to address.

split inference
Swin Transformer
AI-RAN
activation compression
real-time video object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Splitting
Swin Transformer
Activation Compression
AI-RAN
Split Inference