VLAgents: A Policy Server for Efficient VLA Inference

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of fragmented interfaces and communication latency that commonly hinder the deployment of Vision-Language-Action (VLA) models in robotic systems. To overcome these limitations, the authors propose a modular policy server that encapsulates VLA inference through a unified Gymnasium-style interface and introduces, for the first time, a context-aware communication mechanism that adaptively switches between modes: zero-copy shared memory for local execution to accelerate simulation, and compressed streaming for remote operation to reduce bandwidth overhead. The design is compatible with seven mainstream policies, including OpenVLA and Pi Zero, and consistently outperforms the default servers of OpenVLA, OpenPi, and LeRobot in both local and remote benchmarks, significantly enhancing the deployment efficiency and generalizability of VLA systems.

Technology Category

Application Category

📝 Abstract
The rapid emergence of Vision-Language-Action models (VLAs) has a significant impact on robotics. However, their deployment remains complex due to the fragmented interfaces and the inherent communication latency in distributed setups. To address this, we introduce VLAgents, a modular policy server that abstracts VLA inferencing behind a unified Gymnasium-style protocol. Crucially, its communication layer transparently adapts to the context by supporting both zero-copy shared memory for high-speed simulation and compressed streaming for remote hardware. In this work, we present the architecture of VLAgents and validate it by integrating seven policies -- including OpenVLA and Pi Zero. In a benchmark with both local and remote communication, we further demonstrate how it outperforms the default policy servers provided by OpenVLA, OpenPi, and LeRobot. VLAgents is available at https://github.com/RobotControlStack/vlagents
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
deployment complexity
communication latency
fragmented interfaces
distributed robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action models
policy server
zero-copy shared memory
compressed streaming
modular inference
🔎 Similar Papers
No similar papers found.
T
Tobias Julg
University of Technology Nuremberg
K
Khaled Gamal
University of Technology Nuremberg
N
Nisarga Nilavadi
University of Technology Nuremberg
P
Pierre Krack
University of Technology Nuremberg
S
Seongjin Bien
University of Technology Nuremberg
M
Michael Krawez
University of Technology Nuremberg
Florian Walter
Florian Walter
University of Technology Nuremberg, Machine Intelligence Lab
Machine IntelligenceRoboticsMachine LearningAICognitive Robotics
Wolfram Burgard
Wolfram Burgard
Professor of Computer Science, University of Technology Nuremberg
RoboticsArtificial IntelligenceAIMachine LearningComputer Vision