Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the substantial memory overhead of key-value (KV) caching in large language models during reinforcement learning, which arises from long-sequence rollouts and hinders efficient training on resource-constrained hardware. To mitigate this, the authors propose a sparse rollout mechanism that integrates sparsity-aware rejection sampling with importance reweighting to effectively correct off-policy bias induced by KV cache compression. This approach ensures training stability while significantly reducing memory consumption. The method establishes an end-to-end sparse reinforcement learning framework that maintains competitive model performance despite drastically lower rollout memory usage and enhances robustness for deployment under sparse inference conditions.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (LLMs). However, the substantial memory overhead of storing Key-Value (KV) caches during long-horizon rollouts acts as a critical bottleneck, often prohibiting efficient training on limited hardware. While existing KV compression techniques offer a remedy for inference, directly applying them to RL training induces a severe policy mismatch, leading to catastrophic performance collapse. To address this, we introduce Sparse-RL empowers stable RL training under sparse rollouts. We show that instability arises from a fundamental policy mismatch among the dense old policy, the sparse sampler policy, and the learner policy. To mitigate this issue, Sparse-RL incorporates Sparsity-Aware Rejection Sampling and Importance-based Reweighting to correct the off-policy bias introduced by compression-induced information loss. Experimental results show that Sparse-RL reduces rollout overhead compared to dense baselines while preserving the performance. Furthermore, Sparse-RL inherently implements sparsity-aware training, significantly enhancing model robustness during sparse inference deployment.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Large Language Models

Memory Overhead

KV Cache

Policy Mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse-RL

KV cache compression

policy mismatch