RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing edge-cloud collaborative inference methods for Vision-Language-Action (VLA) models are highly susceptible to visual noise and overlook step-level redundancy inherent in embodied tasks, thereby compromising the physical continuity of generated actions. To address these limitations, this work proposes RAPID, a novel framework that, for the first time, explicitly models step-level redundancy in embodied tasks and introduces a noise-resilient, continuity-preserving edge-cloud partitioning mechanism. By jointly leveraging environmental and task-specific features, RAPID dynamically optimizes redundancy-aware inference partitioning strategies. This approach achieves up to 1.73× inference acceleration with only 5%–7% additional overhead, significantly enhancing both computational efficiency and action coherence in embodied reasoning.

Technology Category

Application Category

📝 Abstract
Vision Language Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) inference offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Mainstream environment-oriented edge-cloud partitioning methods are susceptible to interference from visual noise; (2) Existing edge-cloud partitioning methods overlook the step-wise redundancy unique to embodied tasks, thereby disrupting the physical continuity of motion. To address these issues, we propose a novel ECC inference framework, termed RAPID. Specifically, we developed an implementation tailored to the proposed framework. Experiments demonstrate this achieves a speedup of up to 1.73x with only 5%~7% overhead.
Problem

Research questions and friction points this paper is trying to address.

Vision Language Action
Edge-Cloud Collaborative inference
visual noise
step-wise redundancy
physical continuity
Innovation

Methods, ideas, or system contributions that make the work stand out.

edge-cloud collaborative inference
VLA models
step-wise redundancy
partitioning optimization
embodied intelligence
🔎 Similar Papers
No similar papers found.
Zihao Zheng
Zihao Zheng
Peking University
Machine Learning SystemEdge ComputingComputer ArchitectureEDA
S
Sicheng Tian
School of Artificial Intelligence, Beijing Normal University, Beijing, China
H
Hangyu Cao
School of Software Engineering, South China University of Technology, Guangzhou, China
Chenyue Li
Chenyue Li
Hong Kong University of Science and Technology
AI for ScienceLarge Language Model
Jiayu Chen
Jiayu Chen
PhD student, IFLab@PKU
Efficient Visual GenerationML system
M
Maoliang Li
School of Computer Science, Peking University, Beijing, China
X
Xinhao Sun
School of Electronics Engineering and Computer Science, Peking University, Beijing China
H
Hailong Zou
School of Computer Science, Peking University, Beijing, China
Guojie Luo
Guojie Luo
Peking University
Electronic Design AutomationReconfigurable Architecture
X
Xiang Chen
School of Computer Science, Peking University, Beijing, China