OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL

📅 2026-01-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the computational bottlenecks and network congestion in decoupled reinforcement learning caused by dynamic workloads and shifting communication patterns during the generation phase. To this end, the authors propose OrchestrRL, a novel framework that co-designs an adaptive computation scheduler with RFabric, a reconfigurable electro-optical hybrid network. The scheduler dynamically adjusts parallelism to align with workload demands, while RFabric reconfigures its topology in real time to accommodate diverse communication requirements across training, generation, and synchronization stages. Evaluated through the high-fidelity simulator RLSim and a 48-node H800 GPU testbed, the system achieves up to a 1.40× throughput improvement. Compared to static Fat-Tree topologies, RFabric demonstrates superior performance and cost efficiency, significantly enhancing the scalability of large-scale reinforcement learning systems.

Technology Category

Application Category

📝 Abstract
Post-training with reinforcement learning (RL) has greatly enhanced the capabilities of large language models. Disaggregating the generation and training stages in RL into a parallel, asynchronous pipeline offers the potential for flexible scaling and improved throughput. However, it still faces two critical challenges. First, the generation stage often becomes a bottleneck due to dynamic workload shifts and severe execution imbalances. Second, the decoupled stages result in diverse and dynamic network traffic patterns that overwhelm conventional network fabrics. This paper introduces OrchestrRL, an orchestration framework that dynamically manages compute and network rhythms in disaggregated RL. To improve generation efficiency, OrchestrRL employs an adaptive compute scheduler that dynamically adjusts parallelism to match workload characteristics within and across generation steps. This accelerates execution while continuously rebalancing requests to mitigate stragglers. To address the dynamic network demands inherent in disaggregated RL -- further intensified by parallelism switching -- we co-design RFabric, a reconfigurable hybrid optical-electrical fabric. RFabric leverages optical circuit switches at selected network tiers to reconfigure the topology in real time, enabling workload-aware circuits for (i) layer-wise collective communication during training iterations, (ii) generation under different parallelism configurations, and (iii) periodic inter-cluster weight synchronization. We evaluate OrchestrRL on a physical testbed with 48 H800 GPUs, demonstrating up to a 1.40x throughput improvement. Furthermore, we develop RLSim, a high-fidelity simulator, to evaluate RFabric at scale. Our results show that RFabric achieves superior performance-cost efficiency compared to static Fat-Tree networks, establishing it as a highly effective solution for large-scale RL workloads.
Problem

Research questions and friction points this paper is trying to address.

Disaggregated RL
Dynamic Workload
Network Traffic
Execution Imbalance
Scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disaggregated RL
Dynamic Orchestration
Adaptive Compute Scheduling
Reconfigurable Optical-Electrical Fabric
RFabric
🔎 Similar Papers
No similar papers found.