🤖 AI Summary
This work addresses the high inference latency of existing Vision-Language-Action (VLA) models due to their large parameter counts and the inadequacy of static or coarse-grained pruning in dynamic environments. The authors propose EcoVLA, a training-free, plug-and-play framework that enables fine-grained adaptive pruning for the first time. EcoVLA employs Environment-aware Adaptive Pruning (EAP) to dynamically adjust channel sparsity, integrates Interleaved Inference Orchestration (I²O) to parallelize pruning decisions during computational gaps, and leverages temporal consistency in environmental states to update sparse patterns. The method is orthogonal to existing acceleration techniques, achieving up to 1.60× speedup across multiple VLA models with only a 0.4% drop in success rate; when combined with token pruning, it reaches 2.18× acceleration with merely 0.5% performance degradation, demonstrating effectiveness in real-world robotic tasks.
📝 Abstract
While Vision-Language-Action (VLA) models hold promise in embodied intelligence, their large parameter counts lead to substantial inference latency that hinders real-time manipulation, motivating parameter sparsification. However, as the environment evolves during VLA execution, the optimal sparsity patterns change accordingly. Static pruning lacks the adaptability required for environment dynamics, whereas fixed-interval dynamic layer pruning suffers from coarse granularity and high retraining overheads. To bridge this gap, we propose EcoVLA, a training-free, plug-and-play adaptive pruning framework that supports orthogonal combination with existing VLA acceleration methods. EcoVLA comprises two components: Environment-aware Adaptive Pruning (EAP) and Interleaved Inference Orchestration ($I^2O$). EAP is a lightweight adaptive channel pruning method that incorporates the temporal consistency of the physical environment to update sparsity patterns. $I^2O$ leverages the FLOPs bubbles inherent in VLA inference to schedule the pruning method in parallel, ensuring negligible impact on latency. Evaluated on diverse VLA models and benchmarks, EcoVLA delivers state-of-the-art performance, achieving up to 1.60$\times$ speedup with only a 0.4% drop in success rate, and further reaches 2.18$\times$ speedup with only a 0.5% degradation when combined with token pruning. We further validate the effectiveness of EcoVLA on real-world robots.