Environment-Aware Adaptive Pruning with Interleaved Inference Orchestration for Vision-Language-Action Models

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the high inference latency of existing Vision-Language-Action (VLA) models due to their large parameter counts and the inadequacy of static or coarse-grained pruning in dynamic environments. The authors propose EcoVLA, a training-free, plug-and-play framework that enables fine-grained adaptive pruning for the first time. EcoVLA employs Environment-aware Adaptive Pruning (EAP) to dynamically adjust channel sparsity, integrates Interleaved Inference Orchestration (I²O) to parallelize pruning decisions during computational gaps, and leverages temporal consistency in environmental states to update sparse patterns. The method is orthogonal to existing acceleration techniques, achieving up to 1.60× speedup across multiple VLA models with only a 0.4% drop in success rate; when combined with token pruning, it reaches 2.18× acceleration with merely 0.5% performance degradation, demonstrating effectiveness in real-world robotic tasks.

Technology Category

Application Category

📝 Abstract

While Vision-Language-Action (VLA) models hold promise in embodied intelligence, their large parameter counts lead to substantial inference latency that hinders real-time manipulation, motivating parameter sparsification. However, as the environment evolves during VLA execution, the optimal sparsity patterns change accordingly. Static pruning lacks the adaptability required for environment dynamics, whereas fixed-interval dynamic layer pruning suffers from coarse granularity and high retraining overheads. To bridge this gap, we propose EcoVLA, a training-free, plug-and-play adaptive pruning framework that supports orthogonal combination with existing VLA acceleration methods. EcoVLA comprises two components: Environment-aware Adaptive Pruning (EAP) and Interleaved Inference Orchestration ($I^2O$). EAP is a lightweight adaptive channel pruning method that incorporates the temporal consistency of the physical environment to update sparsity patterns. $I^2O$ leverages the FLOPs bubbles inherent in VLA inference to schedule the pruning method in parallel, ensuring negligible impact on latency. Evaluated on diverse VLA models and benchmarks, EcoVLA delivers state-of-the-art performance, achieving up to 1.60$\times$ speedup with only a 0.4% drop in success rate, and further reaches 2.18$\times$ speedup with only a 0.5% degradation when combined with token pruning. We further validate the effectiveness of EcoVLA on real-world robots.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

adaptive pruning

environment dynamics

inference latency

parameter sparsification

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive pruning

environment-aware

interleaved inference