D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

269K/year

🤖 AI Summary

This work addresses the throughput bottleneck in distributed reinforcement learning with large-scale vision-language-action (VLA) models, which arises from the high-fidelity simulation requirements and the substantial memory and bandwidth demands of deep learning. To overcome this challenge, the authors propose D-VLA, a novel framework featuring a “planar decoupling” mechanism that separates high-frequency training data streams from low-frequency weight updates. D-VLA employs a four-thread asynchronous “lane-based” pipeline to fully overlap sampling, inference, gradient computation, and parameter synchronization. Coupled with dual-pool GPU memory management and a topology-aware model replication strategy, the framework effectively eliminates resource contention between simulation and optimization. Experiments demonstrate that D-VLA significantly outperforms existing RL frameworks in throughput and sampling efficiency on benchmarks such as LIBERO, while maintaining linear scaling and high stability even at trillion-parameter scales.

📝 Abstract

The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous "Swimlane" pipeline, enabling full parallel overlap of sampling, inference, gradient computation, and parameter distribution. Additionally, a dual-pool VRAM management model and topology-aware replication resolve memory fragmentation and optimize communication efficiency. Experiments on benchmarks like LIBERO show that D-VLA significantly outperforms mainstream RL frameworks in throughput and sampling efficiency for billion-parameter VLA models. In trillion-parameter scalability tests, our framework maintains exceptional stability and linear speedup, providing a robust system for high-performance general-purpose embodied agents.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action

Reinforcement Learning

Distributed Systems

Resource Conflict

Embodied AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plane Decoupling

Swimlane pipeline

distributed reinforcement learning