D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

251K/year
🤖 AI Summary
This work addresses the throughput bottleneck in distributed reinforcement learning with large-scale vision-language-action (VLA) models, which arises from the high-fidelity simulation requirements and the substantial memory and bandwidth demands of deep learning. To overcome this challenge, the authors propose D-VLA, a novel framework featuring a “planar decoupling” mechanism that separates high-frequency training data streams from low-frequency weight updates. D-VLA employs a four-thread asynchronous “lane-based” pipeline to fully overlap sampling, inference, gradient computation, and parameter synchronization. Coupled with dual-pool GPU memory management and a topology-aware model replication strategy, the framework effectively eliminates resource contention between simulation and optimization. Experiments demonstrate that D-VLA significantly outperforms existing RL frameworks in throughput and sampling efficiency on benchmarks such as LIBERO, while maintaining linear scaling and high stability even at trillion-parameter scales.
📝 Abstract
The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous "Swimlane" pipeline, enabling full parallel overlap of sampling, inference, gradient computation, and parameter distribution. Additionally, a dual-pool VRAM management model and topology-aware replication resolve memory fragmentation and optimize communication efficiency. Experiments on benchmarks like LIBERO show that D-VLA significantly outperforms mainstream RL frameworks in throughput and sampling efficiency for billion-parameter VLA models. In trillion-parameter scalability tests, our framework maintains exceptional stability and linear speedup, providing a robust system for high-performance general-purpose embodied agents.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action
Reinforcement Learning
Distributed Systems
Resource Conflict
Embodied AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plane Decoupling
Swimlane pipeline
distributed reinforcement learning
VRAM management
Vision-Language-Action models
Yucheng Guo
Yucheng Guo
Princeton University
Stochastic AnalysisPartial Differential EquationsMathematical Finance
Y
Yongjian Guo
Tsinghua University, JDT AI Infra
Zhong Guan
Zhong Guan
PhD of Electrical and Computer Engineering, UCSB
ElectromigrationReliabilitySRAMEDASimulation
Wen Huang
Wen Huang
Tsinghua University
Generative model
Haoran Sun
Haoran Sun
Peking University
Algorithmic game theoryMachine learningLarge Language Models
H
Haodong Yue
Tsinghua University, JDT AI Infra
X
Xiaolong Xiang
Beihang University, JDT AI Infra
S
Shuai Di
JDT AI Infra
Zhen Sun
Zhen Sun
DSA Thrust, HKUST(GZ)
LLM security
L
Luqiao Wang
Beihang University, JDT AI Infra
J
Junwu Xiong
JDT AI Infra
Y
Yicheng Gong
JDT AI Infra