Reliability-Adjusted Prioritized Experience Replay

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard Prioritized Experience Replay (PER) fails to distinguish experiences by their learning potential, resulting in suboptimal sampling efficiency. To address this, we propose Reliability-Adjusted PER (RA-PER), which introduces a temporal-difference (TD) error reliability metric—dynamically adjusting sample priorities based on the variance stability of TD errors to preferentially select high-information, low-noise transitions. Theoretically, RA-PER achieves superior sample efficiency compared to standard PER under mild assumptions. Empirically, RA-PER demonstrates significantly faster convergence and improved final performance across benchmark domains, including Atari-5, without requiring auxiliary networks or additional hyperparameter tuning.

Technology Category

Application Category

📝 Abstract
Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-5 benchmark.
Problem

Research questions and friction points this paper is trying to address.

Improves Prioritized Experience Replay efficiency
Introduces reliability measure for temporal difference errors
Enhances learning performance in diverse environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces temporal difference error reliability measure
Enhances Prioritized Experience Replay efficiency
Outperforms PER in diverse environments