MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL) training systems suffer from poor cluster scalability and low memory utilization due to strong cross-node dependencies. To address this, we propose a distributed dataflow architecture tailored for large-scale RL training, featuring a novel distributed transfer hub and an allgather-swap mechanism that decouples sample streaming from resharding traffic, eliminating centralized scheduling bottlenecks and substantially reducing communication overhead and redundant memory consumption. Integrated with a dynamic controller, warehouse-style deployment, optimized resharding communication, and multi-dimensional parallelism acceleration, our design enables holistic system-level co-optimization. Evaluated on a 384-chip Ascend NPU supercomputing cluster, our system achieves 1.42–3.97× higher throughput than state-of-the-art baselines and efficiently supports alignment training of billion- to trillion-parameter models, including Qwen and DeepSeek.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization. In this article, we introduce MindSpeed RL, an effective and efficient system for large-scale RL training. Unlike existing centralized methods, MindSpeed RL organizes the essential data dependencies in RL training, i.e., sample flow and resharding flow, from a distributed view. On the one hand, a distributed transfer dock strategy, which sets controllers and warehouses on the basis of the conventional replay buffer, is designed to release the dispatch overhead in the sample flow. A practical allgather--swap strategy is presented to eliminate redundant memory usage in resharding flow. In addition, MindSpeed RL further integrates numerous parallelization strategies and acceleration techniques for systematic optimization. Compared with existing state-of-the-art systems, comprehensive experiments on the RL training of popular Qwen2.5-Dense-7B/32B, Qwen3-MoE-30B, and DeepSeek-R1-MoE-671B show that MindSpeed RL increases the throughput by 1.42 ~ 3.97 times. Finally, we open--source MindSpeed RL and perform all the experiments on a super pod of Ascend with 384 neural processing units (NPUs) to demonstrate the powerful performance and reliability of Ascend.
Problem

Research questions and friction points this paper is trying to address.

Improves poor cluster scalability in RL training
Reduces low memory utilization in RL systems
Optimizes cross-node dependencies in distributed RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed transfer dock strategy for sample flow
Allgather-swap strategy for resharding flow
Integrated parallelization and acceleration techniques
🔎 Similar Papers
2023-01-17IEEE Transactions on Parallel and Distributed SystemsCitations: 3
L
Liangjun Feng
Huawei Lianqiu Lake R&D Center, Xicen Community, Jinze Town, Qingpu District, Shanghai 201718 China
C
Chenyi Pan
Huawei Lianqiu Lake R&D Center
X
Xinjie Guo
Huawei Lianqiu Lake R&D Center
F
Fei Mei
Huawei Lianqiu Lake R&D Center
B
Benzhe Ning
Huawei Lianqiu Lake R&D Center
J
Jianxiang Zhang
Huawei Lianqiu Lake R&D Center
X
Xinyang Liu
Huawei Lianqiu Lake R&D Center
B
Beirong Zhou
Huawei Lianqiu Lake R&D Center
Z
Zeng Shu
Huawei Lianqiu Lake R&D Center
C
Chang Liu
Huawei Lianqiu Lake R&D Center
G
Guang Yang
Huawei Lianqiu Lake R&D Center
Zhenyu Han
Zhenyu Han
Ph.D, Department of Electronic Engineering, Tsinghua University, China
Data MiningGraph Neural NetworkEpidemiological Model
J
Jiangben Wang
Huawei Lianqiu Lake R&D Center
B
Bo Wang
Huawei Lianqiu Lake R&D Center