VLAgents: A Policy Server for Efficient VLA Inference

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

240K/year
🤖 AI Summary
This work addresses the challenges of fragmented interfaces and communication latency that commonly hinder the deployment of Vision-Language-Action (VLA) models in robotic systems. To overcome these limitations, the authors propose a modular policy server that encapsulates VLA inference through a unified Gymnasium-style interface and introduces, for the first time, a context-aware communication mechanism that adaptively switches between modes: zero-copy shared memory for local execution to accelerate simulation, and compressed streaming for remote operation to reduce bandwidth overhead. The design is compatible with seven mainstream policies, including OpenVLA and Pi Zero, and consistently outperforms the default servers of OpenVLA, OpenPi, and LeRobot in both local and remote benchmarks, significantly enhancing the deployment efficiency and generalizability of VLA systems.

Technology Category

Application Category

📝 Abstract
The rapid emergence of Vision-Language-Action models (VLAs) has a significant impact on robotics. However, their deployment remains complex due to the fragmented interfaces and the inherent communication latency in distributed setups. To address this, we introduce VLAgents, a modular policy server that abstracts VLA inferencing behind a unified Gymnasium-style protocol. Crucially, its communication layer transparently adapts to the context by supporting both zero-copy shared memory for high-speed simulation and compressed streaming for remote hardware. In this work, we present the architecture of VLAgents and validate it by integrating seven policies -- including OpenVLA and Pi Zero. In a benchmark with both local and remote communication, we further demonstrate how it outperforms the default policy servers provided by OpenVLA, OpenPi, and LeRobot. VLAgents is available at https://github.com/RobotControlStack/vlagents
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
deployment complexity
communication latency
fragmented interfaces
distributed robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action models
policy server
zero-copy shared memory
compressed streaming
modular inference
🔎 Similar Papers
No similar papers found.