🤖 AI Summary
Existing vision-language-action (VLA) models suffer from high inference overhead, hindering real-time deployment on edge devices and web browsers. To address this, we propose BLURR—a lightweight, plug-and-play inference framework that requires no model retraining or weight modification. Our method introduces three key innovations: (1) the first instruction-prefixed KV cache reuse mechanism to eliminate redundant computation across sequential steps; (2) FP16/INT8 mixed-precision arithmetic for efficient tensor operations; and (3) a single-step rollout scheduling strategy optimized for the Pi-Zero controller. Evaluated on SimplerEnv, BLURR maintains original task success rates while reducing FLOPs by up to 4.2× and end-to-end latency by 3.8×. Furthermore, it enables a low-latency, interactive web-based demo with real-time policy switching. BLURR establishes a new paradigm for efficient, hardware-aware VLA model deployment without compromising performance.
📝 Abstract
Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle inference options in real time while watching manipulation episodes. This highlights BLURR as a practical approach for deploying modern VLA policies under tight compute budgets.