🤖 AI Summary
This work addresses the low sample efficiency and training instability of experience replay in reinforcement learning. We systematically introduce the Random Reshuffling (RR) mechanism—previously shown to yield superior convergence properties in supervised learning—into RL experience replay for the first time. We propose RR-based extensions applicable to both uniform and prioritized replay buffers, overcoming statistical redundancy and convergence limitations inherent in traditional independent, with-replacement sampling. Theoretical analysis demonstrates accelerated convergence under RR. Empirical evaluation within the DQN framework on the Atari benchmark shows that, compared to standard prioritized sampling, our approach significantly improves sample efficiency, accelerates convergence, and simultaneously enhances policy performance and training stability. This work establishes a novel paradigm for experience replay in reinforcement learning.
📝 Abstract
Experience replay is a key component in reinforcement learning for stabilizing learning and improving sample efficiency. Its typical implementation samples transitions with replacement from a replay buffer. In contrast, in supervised learning with a fixed dataset, it is a common practice to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR). RR enjoys theoretically better convergence properties and has been shown to outperform with-replacement sampling empirically. To leverage the benefits of RR in reinforcement learning, we propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.