BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing vision-language-action (VLA) models suffer from high inference overhead, hindering real-time deployment on edge devices and web browsers. To address this, we propose BLURR—a lightweight, plug-and-play inference framework that requires no model retraining or weight modification. Our method introduces three key innovations: (1) the first instruction-prefixed KV cache reuse mechanism to eliminate redundant computation across sequential steps; (2) FP16/INT8 mixed-precision arithmetic for efficient tensor operations; and (3) a single-step rollout scheduling strategy optimized for the Pi-Zero controller. Evaluated on SimplerEnv, BLURR maintains original task success rates while reducing FLOPs by up to 4.2× and end-to-end latency by 3.8×. Furthermore, it enables a low-latency, interactive web-based demo with real-time policy switching. BLURR establishes a new paradigm for efficient, hardware-aware VLA model deployment without compromising performance.

Technology Category

Application Category

📝 Abstract

Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle inference options in real time while watching manipulation episodes. This highlights BLURR as a practical approach for deploying modern VLA policies under tight compute budgets.

Problem

Research questions and friction points this paper is trying to address.

Reduces heavy inference stacks for responsive web demos

Accelerates control in VLA models without retraining

Enables deployment of VLA policies under compute constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight wrapper for VLA models without retraining

Uses key value cache and mixed precision for acceleration

Reduces computation with single step rollout schedule

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs