JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the inefficiencies in large language model reinforcement learning (RL) post-training caused by coarse-grained pipeline abstractions that obscure intra- and inter-node load imbalances, leading to suboptimal resource utilization and limited throughput. To overcome this, the authors propose JigsawRL, a novel framework that introduces pipeline reuse into RL parallelism for the first time. JigsawRL employs a sub-stage graph model to expose fine-grained load distributions and integrates dynamic resource allocation, long-tail trajectory migration, and a lookahead heuristic graph scheduler to jointly optimize execution efficiency. Evaluated on 4–64 H100/A100 GPUs, JigsawRL achieves 1.85× higher synchronous training throughput than Verl and 1.54× higher asynchronous throughput than StreamRL and AReaL, while supporting heterogeneous pipelines with bounded latency.

Technology Category

Application Category

📝 Abstract

We present JigsawRL, a cost-efficient framework that explores Pipeline Multiplexing as a new dimension of RL parallelism. JigsawRL decomposes each pipeline into a Sub-Stage Graph that exposes the intra-stage and inter-worker imbalance hidden by stage-level systems. On this abstraction, JigsawRL resolves multiplexing interference through dynamic resource allocation, eliminates fragmented utilization by migrating long-tail rollouts across workers, and formulates their coordination as a graph scheduling problem solved with a look-ahead heuristic. On 4-64 H100/A100 GPUs across different agentic RL pipelines and models, JigsawRL achieves up to 1.85x throughput over Verl on synchronous RL, 1.54x over StreamRL and AReaL on asynchronous RL, and supports heterogeneous pipelines with moderate latency trade-off.

Problem

Research questions and friction points this paper is trying to address.

RL Post-Training

Pipeline Parallelism

Resource Imbalance

Throughput Optimization

LLM Training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pipeline Multiplexing

Sub-Stage Graph

Dynamic Resource Allocation