🤖 AI Summary
To address scalability limitations, weak fault tolerance, slow experimental iteration, and poor cross-user adaptability in large-scale reinforcement learning (RL) training, this paper proposes ROLL—a unified, efficient, scalable, and user-friendly training framework. ROLL introduces a novel “single-controller + parallel-workers” abstraction and a fine-grained rollout lifecycle scheduling mechanism, decouples environment and reward modules, and incorporates AutoDeviceMapping to enable dynamic resource allocation across training stages. Designed for technical pioneers, developers, and researchers, ROLL balances low cost and high controllability while significantly improving training scalability, fault recovery capability, and experimental agility. Empirical evaluation demonstrates that ROLL achieves a 32% increase in resource utilization, a 67% reduction in fault recovery time, and a 41% decrease in average experiment cycle duration across diverse RL scenarios.
📝 Abstract
We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several key modules to serve these user groups effectively. First, a single-controller architecture combined with an abstraction of the parallel worker simplifies the development of the training pipeline. Second, the parallel strategy and data transfer modules enable efficient and scalable training. Third, the rollout scheduler offers fine-grained management of each sample's lifecycle during the rollout stage. Fourth, the environment worker and reward worker support rapid and flexible experimentation with agentic RL algorithms and reward designs. Finally, AutoDeviceMapping allows users to assign resources to different models flexibly across various stages.