When Does Non-Uniform Replay Matter in Reinforcement Learning?

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work investigates when and why non-uniform experience replay outperforms uniform replay in off-policy reinforcement learning, identifying key factors that govern its effectiveness. Through theoretical analysis and large-scale experiments, the study systematically evaluates the roles of replay buffer size, recency bias, and sampling distribution entropy, revealing that non-uniform replay is most beneficial under small buffer regimes and that high-entropy sampling is crucial for performance. Building on these insights, the authors propose Truncated Geometric Replay, a strategy that prioritizes recent experiences while preserving high sampling diversity. The method consistently enhances sample efficiency across multiple algorithms and five benchmark tasks, achieving substantial gains with limited replay capacity and remaining competitive even with large buffers.

📝 Abstract

Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay design in modern off-policy RL. Namely, we find that non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency. Motivated by these findings, we adopt a simple Truncated Geometric replay that biases sampling toward recent experience while preserving high entropy and incurring negligible computational overhead. Across large-scale parallel simulation, single-task, and multi-task settings, including three modern algorithms evaluated on five RL benchmark suites, this replay sampling strategy improves sample efficiency in low-volume regimes while remaining competitive when replay volume is high.

Problem

Research questions and friction points this paper is trying to address.

non-uniform replay

replay sampling

off-policy reinforcement learning

sample efficiency

replay buffer

Innovation

Methods, ideas, or system contributions that make the work stand out.

non-uniform replay

replay volume

sampling entropy