RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation

๐Ÿ“… 2025-09-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Reinforcement learning (RL) training suffers from low hardware utilization and constrained throughput due to highly heterogeneous and dynamic workloads. To address this, we propose RLinf, the first system introducing the Macro-to-Micro Flow (M2Flow) paradigm, which automatically decomposes high-level RL policy workflows into fine-grained, schedulable micro-operation streams. RLinf supports adaptive communication, lightweight context switching, and elastic pipeline orchestration, guided by performance profilingโ€“driven scheduling to optimize spatiotemporal execution structures. Evaluated on inference-oriented and embodied RL tasks, RLinf achieves 1.1ร—โ€“2.13ร— higher end-to-end training throughput over state-of-the-art systems, while significantly improving GPU utilization and runtime flexibility. By unifying workflow abstraction with system-aware scheduling, RLinf delivers an efficient, scalable systems foundation for large-scale RL training.

Technology Category

Application Category

๐Ÿ“ Abstract
Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to low hardware utilization and slow training on existing systems. In this paper, we present RLinf, a high-performance RL training system based on our key observation that the major roadblock to efficient RL training lies in system flexibility. To maximize flexibility and efficiency, RLinf is built atop a novel RL system design paradigm called macro-to-micro flow transformation (M2Flow), which automatically breaks down high-level, easy-to-compose RL workflows at both the temporal and spatial dimensions, and recomposes them into optimized execution flows. Supported by RLinf worker's adaptive communication capability, we devise context switching and elastic pipelining to realize M2Flow transformation, and a profiling-guided scheduling policy to generate optimal execution plans. Extensive evaluations on both reasoning RL and embodied RL tasks demonstrate that RLinf consistently outperforms state-of-the-art systems, achieving 1.1x-2.13x speedup in end-to-end training throughput.
Problem

Research questions and friction points this paper is trying to address.

Improving hardware utilization in large-scale reinforcement learning workflows
Addressing system flexibility bottlenecks in reinforcement learning training
Optimizing execution flows through automated workflow decomposition and recomposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Macro-to-micro flow transformation paradigm
Adaptive communication with context switching
Profiling-guided optimal execution scheduling
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chao Yu
Tsinghua University
Yuanqing Wang
Yuanqing Wang
Materials Genome Institute, Shanghai University
CatalysisMaterials ScienceEnvironmental Science
Z
Zhen Guo
Infinigence AI
H
Hao Lin
Infinigence AI
S
Si Xu
Infinigence AI
H
Hongzhi Zang
Tsinghua University
Q
Quanlu Zhang
Infinigence AI
Yongji Wu
Yongji Wu
UC Berkeley
Machine Learning SystemsDatacenter Networks
C
Chunyang Zhu
Infinigence AI
J
Junhao Hu
Infinigence AI
Z
Zixiao Huang
Tsinghua University
Mingjie Wei
Mingjie Wei
xidian university
3D HumanMotion generation3D human pose estimation
Y
Yuqing Xie
Tsinghua University
K
Ke Yang
Zhongguancun Academy
B
Bo Dai
Beihang University
Z
Zhexuan Xu
Tsinghua University
Xiangyuan Wang
Xiangyuan Wang
Wuhan University
Neuromorphic VisionImage ProcessingPattern Recognition
X
Xu Fu
Infinigence AI
Z
Zhihao Liu
Zhongguancun Academy
K
Kang Chen
Peking University
Weilin Liu
Weilin Liu
University of Ottawa
Microwave PhotonicsPhotonic Integrated CircuitsSilicon PhotonicsAll-optical signal processing
G
Gang Liu
Tsinghua University
B
Boxun Li
Infinigence AI
Jianlei Yang
Jianlei Yang
Beihang University
Deep LearningComputer ArchitectureNueromorphic ComputingSpitronicsEDA/VLSI
Z
Zhi Yang
Peking University