Reliability-Adjusted Prioritized Experience Replay

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Standard Prioritized Experience Replay (PER) fails to distinguish experiences by their learning potential, resulting in suboptimal sampling efficiency. To address this, we propose Reliability-Adjusted PER (RA-PER), which introduces a temporal-difference (TD) error reliability metric—dynamically adjusting sample priorities based on the variance stability of TD errors to preferentially select high-information, low-noise transitions. Theoretically, RA-PER achieves superior sample efficiency compared to standard PER under mild assumptions. Empirically, RA-PER demonstrates significantly faster convergence and improved final performance across benchmark domains, including Atari-5, without requiring auxiliary networks or additional hyperparameter tuning.

Technology Category

Application Category

📝 Abstract

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-5 benchmark.

Problem

Research questions and friction points this paper is trying to address.

Improves Prioritized Experience Replay efficiency

Introduces reliability measure for temporal difference errors

Enhances learning performance in diverse environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces temporal difference error reliability measure

Enhances Prioritized Experience Replay efficiency

Outperforms PER in diverse environments

🔎 Similar Papers

Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences