Uncertainty Prioritized Experience Replay

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prioritized Experience Replay (PER) suffers from noise sensitivity due to its reliance on temporal-difference (TD) errors, leading to the “noisy-TV” problem, degraded sample efficiency, and policy instability. To address this, we propose replacing TD error with epistemic uncertainty as the basis for experience prioritization—a novel integration of epistemic uncertainty estimation into PER. Our approach synergistically combines Quantile Regression DQN (QR-DQN) with tabular uncertainty modeling to robustly distinguish genuine learning signals from stochastic noise. This is the first systematic incorporation of epistemic uncertainty into PER’s priority mechanism. Empirical evaluation across multi-armed bandits, noisy grid-world environments, and the Atari benchmark demonstrates substantial improvements in both sample efficiency and policy stability. Our method consistently outperforms standard QR-DQN across all domains, establishing a more robust and reliable prioritization framework for deep reinforcement learning.

Technology Category

Application Category

📝 Abstract
Prioritized experience replay, which improves sample efficiency by selecting relevant transitions to update parameter estimates, is a crucial component of contemporary value-based deep reinforcement learning models. Typically, transitions are prioritized based on their temporal difference error. However, this approach is prone to favoring noisy transitions, even when the value estimation closely approximates the target mean. This phenomenon resembles the noisy TV problem postulated in the exploration literature, in which exploration-guided agents get stuck by mistaking noise for novelty. To mitigate the disruptive effects of noise in value estimation, we propose using epistemic uncertainty estimation to guide the prioritization of transitions from the replay buffer. Epistemic uncertainty quantifies the uncertainty that can be reduced by learning, hence reducing transitions sampled from the buffer generated by unpredictable random processes. We first illustrate the benefits of epistemic uncertainty prioritized replay in two tabular toy models: a simple multi-arm bandit task, and a noisy gridworld. Subsequently, we evaluate our prioritization scheme on the Atari suite, outperforming quantile regression deep Q-learning benchmarks; thus forging a path for the use of uncertainty prioritized replay in reinforcement learning agents.
Problem

Research questions and friction points this paper is trying to address.

Prioritizing transitions based on noisy temporal difference errors.
Mitigating noise disruption in value estimation using epistemic uncertainty.
Improving reinforcement learning with uncertainty-guided experience replay.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prioritizes transitions using epistemic uncertainty
Reduces noise impact in value estimation
Outperforms quantile regression benchmarks