APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In reinforcement learning, rollout generation suffers from the long-tailed distribution of response lengths, causing frequent GPU idleness and low batch utilization—severely limiting training scalability. To address this, we propose Active Partial Rollouts (APRIL), the first systematic solution to resource waste induced by response-length heterogeneity. APRIL achieves high-density batch packing without discarding any rollout by jointly leveraging overscheduling, dynamic truncation, and reuse of incomplete samples. It is fully compatible with mainstream RLHF algorithms—including GRPO, DAPO, and GSPO—and supports heterogeneous hardware (NVIDIA/AMD) as well as frameworks such as Slime RL. Experiments across diverse tasks demonstrate that APRIL improves training throughput by up to 44%, accelerates convergence, and yields up to an 8% absolute gain in task accuracy.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has become a cornerstone in advancing large-scale pre-trained language models (LLMs). Successive generations, including GPT-o series, DeepSeek-R1, Kimi-K1.5, Grok 4, and GLM-4.5, have relied on large-scale RL training to enhance reasoning and coding capabilities. To meet the community's growing RL needs, numerous RL frameworks have been proposed. Most of these frameworks primarily rely on inference engines for rollout generation and training engines for policy updates. However, RL training remains computationally expensive, with rollout generation accounting for more than 90% of total runtime. In addition, its efficiency is often constrained by the long-tail distribution of rollout response lengths, where a few lengthy responses stall entire batches, leaving GPUs idle and underutilized. As model and rollout sizes continue to grow, this bottleneck increasingly limits scalability. To address this challenge, we propose Active Partial Rollouts in Reinforcement Learning (APRIL), which mitigates long-tail inefficiency. In the rollout phase, APRIL over-provisions rollout requests, terminates once the target number of responses is reached, and recycles incomplete responses for continuation in future steps. This strategy ensures that no rollouts are discarded while substantially reducing GPU idle time. Experiments show that APRIL improves rollout throughput by at most 44% across commonly used RL algorithms (GRPO, DAPO, GSPO), accelerates convergence, and achieves at most 8% higher final accuracy across tasks. Moreover, APRIL is both framework and hardware agnostic, already integrated into the slime RL framework, and deployable on NVIDIA and AMD GPUs alike. Taken together, this work unifies system-level and algorithmic considerations in proposing APRIL, with the aim of advancing RL training efficiency and inspiring further optimizations in RL systems.
Problem

Research questions and friction points this paper is trying to address.

RL training efficiency is limited by long-tail rollout response lengths
Lengthy responses stall batches causing GPU idle time and underutilization
Rollout generation accounts for over 90% of total RL runtime cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active Partial Rollouts (APRIL) for RL efficiency
Over-provisions and recycles rollouts to reduce GPU idle time
Framework-agnostic method improving throughput and final accuracy
🔎 Similar Papers
No similar papers found.
Y
Yuzhen Zhou
Advanced Micro Devices, Inc. (AMD)
J
Jiajun Li
Carnegie Mellon University (CMU)
Yusheng Su
Yusheng Su
AMD | Tsinghua University
Large Language ModelMachine LearningMLSys
G
Gowtham Ramesh
Advanced Micro Devices, Inc. (AMD)
Zilin Zhu
Zilin Zhu
LMSYS Org
X
Xiang Long
LMSYS Org
C
Chenyang Zhao
LMSYS Org
J
Jin Pan
LMSYS Org
X
Xiaodong Yu
Advanced Micro Devices, Inc. (AMD)
Z
Ze Wang
Advanced Micro Devices, Inc. (AMD)
Kangrui Du
Kangrui Du
Georgia Institute of Technology
Spiking Neural NetworksBrain Image Analysis
Jialian Wu
Jialian Wu
AMD GenAI
LLMComputer Vision
X
Ximeng Sun
Advanced Micro Devices, Inc. (AMD)
J
Jiang Liu
Advanced Micro Devices, Inc. (AMD)
Q
Qiaolin Yu
LMSYS Org
H
Hao Chen
Advanced Micro Devices, Inc. (AMD)
Z
Zicheng Liu
Advanced Micro Devices, Inc. (AMD)
Emad Barsoum
Emad Barsoum
AMD, Columbia University
Generative AIFoundation ModelsAgentic AIComputer VisionML Frameworks