RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address low utilization and frequent interruptions of preemptible GPU resources (e.g., cloud spot instances or idle cluster GPUs) in reinforcement learning (RL) training for large language models—leading to throughput bottlenecks and high costs—this paper proposes RLBoost. Our method introduces three key techniques: (1) an adaptive rollout offloading mechanism that dynamically migrates compute-intensive rollouts to idle or low-cost GPUs; (2) a pull-based weight synchronization protocol to decouple training from inference and reduce communication overhead; and (3) token-level response migration, enabling fine-grained fault tolerance and rapid recovery. RLBoost employs a hybrid architecture integrating preemptibility-aware scheduling, dynamic load balancing, and lightweight state synchronization. Experiments demonstrate that, compared to on-demand GPU-only training, RLBoost achieves 1.51–1.97× higher training throughput and improves throughput-per-dollar by 28%–49%.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has become essential for unlocking advanced reasoning capabilities in large language models (LLMs). RL workflows involve interleaving rollout and training stages with fundamentally different resource requirements. Rollout typically dominates overall execution time, yet scales efficiently through multiple independent instances. In contrast, training requires tightly-coupled GPUs with full-mesh communication. Existing RL frameworks fall into two categories: co-located and disaggregated architectures. Co-located ones fail to address this resource tension by forcing both stages to share the same GPUs. Disaggregated architectures, without modifications of well-established RL algorithms, suffer from resource under-utilization. Meanwhile, preemptible GPU resources, i.e., spot instances on public clouds and spare capacity in production clusters, present significant cost-saving opportunities for accelerating RL workflows, if efficiently harvested for rollout. In this paper, we present RLBoost, a systematic solution for cost-efficient RL training that harvests preemptible GPU resources. Our key insight is that rollout's stateless and embarrassingly parallel nature aligns perfectly with preemptible and often fragmented resources. To efficiently utilize these resources despite frequent and unpredictable availability changes, RLBoost adopts a hybrid architecture with three key techniques: (1) adaptive rollout offload to dynamically adjust workloads on the reserved (on-demand) cluster, (2) pull-based weight transfer that quickly provisions newly available instances, and (3) token-level response collection and migration for efficient preemption handling and continuous load balancing. Extensive experiments show RLBoost increases training throughput by 1.51x-1.97x while improving cost efficiency by 28%-49% compared to using only on-demand GPU resources.

Problem

Research questions and friction points this paper is trying to address.

Harvesting preemptible GPU resources for cost-efficient reinforcement learning

Addressing resource under-utilization in RL workflows for large language models

Optimizing rollout stage execution using fragmented and interruptible computing resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Harvests preemptible GPU resources efficiently

Uses adaptive rollout offload for dynamic workloads

Implements token-level response collection for preemption handling

🔎 Similar Papers

Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework