Laminar: A Scalable Asynchronous RL Post-Training Framework

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing asynchronous RL systems for LLM post-training suffer from two critical bottlenecks: (1) trajectory generation exhibits an extreme long-tail latency distribution, severely underutilizing GPU resources; and (2) rigid global weight synchronization mechanisms fail to adapt to dynamically varying inter-node delays, limiting scalability. This paper introduces TrajSync—the first trajectory-level asynchronous training system tailored for LLM RL post-training. It features: (1) relay-node-based, fine-grained, lock-free trajectory-level parameter synchronization; (2) a dynamic repackaging strategy that adaptively mitigates long-tail latency; and (3) a fully decoupled architecture with a distributed parameter server, enabling high-concurrency, independent trajectory generation and update. Evaluated on a 1024-GPU cluster, TrajSync achieves a 5.48× throughput improvement, significantly reduces time-to-convergence, and enhances robustness in long-duration training.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) post-training for Large Language Models (LLMs) is now scaling to large clusters and running for extended durations to enhance model reasoning performance. However, the scalability of existing RL frameworks is limited, as extreme long-tail skewness in RL trajectory generation causes severe GPU underutilization. Current asynchronous RL systems attempt to mitigate this, but they rely on global weight synchronization between the actor and all rollouts, which creates a rigid model update schedule. This global synchronization is ill-suited for the highly skewed and evolving distribution of trajectory generation latency in RL training, crippling training efficiency. Our key insight is that efficient scaling requires breaking this lockstep through trajectory-level asynchrony, which generates and consumes each trajectory independently. We propose Laminar, a scalable and robust RL post-training system built on a fully decoupled architecture. First, we replace global updates with a tier of relay workers acting as a distributed parameter service. This enables asynchronous and fine-grained weight synchronization, allowing rollouts to pull the latest weight anytime without stalling the actor's training loop. Second, a dynamic repack mechanism consolidates long-tail trajectories onto a few dedicated rollouts, maximizing generation throughput. The fully decoupled design also isolates failures, ensuring robustness for long-running jobs. Our evaluation on a 1024-GPU cluster shows that Laminar achieves up to 5.48$ imes$ training throughput speedup over state-of-the-art systems, while reducing model convergence time.

Problem

Research questions and friction points this paper is trying to address.

Addresses GPU underutilization from skewed RL trajectory generation

Eliminates rigid global synchronization in asynchronous RL systems

Improves scalability and robustness for large-scale RL post-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled architecture with distributed parameter service

Dynamic repack mechanism for long-tail trajectories

Asynchronous weight synchronization without global updates

🔎 Similar Papers

No similar papers found.