Experience Replay with Random Reshuffling

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low sample efficiency and training instability of experience replay in reinforcement learning. We systematically introduce the Random Reshuffling (RR) mechanism—previously shown to yield superior convergence properties in supervised learning—into RL experience replay for the first time. We propose RR-based extensions applicable to both uniform and prioritized replay buffers, overcoming statistical redundancy and convergence limitations inherent in traditional independent, with-replacement sampling. Theoretical analysis demonstrates accelerated convergence under RR. Empirical evaluation within the DQN framework on the Atari benchmark shows that, compared to standard prioritized sampling, our approach significantly improves sample efficiency, accelerates convergence, and simultaneously enhances policy performance and training stability. This work establishes a novel paradigm for experience replay in reinforcement learning.

Technology Category

Application Category

📝 Abstract
Experience replay is a key component in reinforcement learning for stabilizing learning and improving sample efficiency. Its typical implementation samples transitions with replacement from a replay buffer. In contrast, in supervised learning with a fixed dataset, it is a common practice to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR). RR enjoys theoretically better convergence properties and has been shown to outperform with-replacement sampling empirically. To leverage the benefits of RR in reinforcement learning, we propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.
Problem

Research questions and friction points this paper is trying to address.

Extends random reshuffling to reinforcement learning experience replay.
Improves convergence and sample efficiency in deep reinforcement learning.
Evaluates methods on Atari benchmarks for effectiveness.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends random reshuffling to experience replay
Applies reshuffling in uniform and prioritized settings
Evaluates on Atari benchmarks for effectiveness
🔎 Similar Papers
No similar papers found.