BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language-action (VLA) models suffer from high inference overhead, hindering real-time deployment on edge devices and web browsers. To address this, we propose BLURR—a lightweight, plug-and-play inference framework that requires no model retraining or weight modification. Our method introduces three key innovations: (1) the first instruction-prefixed KV cache reuse mechanism to eliminate redundant computation across sequential steps; (2) FP16/INT8 mixed-precision arithmetic for efficient tensor operations; and (3) a single-step rollout scheduling strategy optimized for the Pi-Zero controller. Evaluated on SimplerEnv, BLURR maintains original task success rates while reducing FLOPs by up to 4.2× and end-to-end latency by 3.8×. Furthermore, it enables a low-latency, interactive web-based demo with real-time policy switching. BLURR establishes a new paradigm for efficient, hardware-aware VLA model deployment without compromising performance.

Technology Category

Application Category

📝 Abstract
Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle inference options in real time while watching manipulation episodes. This highlights BLURR as a practical approach for deploying modern VLA policies under tight compute budgets.
Problem

Research questions and friction points this paper is trying to address.

Reduces heavy inference stacks for responsive web demos
Accelerates control in VLA models without retraining
Enables deployment of VLA policies under compute constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight wrapper for VLA models without retraining
Uses key value cache and mixed precision for acceleration
Reduces computation with single step rollout schedule
🔎 Similar Papers
No similar papers found.
Xiaoyu Ma
Xiaoyu Ma
Carnegie Mellon University
Transportation network modelingmachine learningreinforcement learningsimulationoptimization
Zhengqing Yuan
Zhengqing Yuan
PhD student, University of Notre Dame
NLPDeeplearningCV
Z
Zheyuan Zhang
University of Notre Dame
K
Kaiwen Shi
University of Notre Dame
L
Lichao Sun
Lehigh University
Y
Yanfang Ye
University of Notre Dame