🤖 AI Summary
Overestimation bias in Q-value estimation critically impedes convergence speed in Q-learning and Actor-Critic algorithms, yet existing deep reinforcement learning (DRL) methods lack adaptive mechanisms to mitigate it. To address this, we propose a local adaptive overestimation suppression framework for distributional RL (DRL), built upon a double-Q architecture and augmented with a dynamic error correction module that enables online, localized adjustment of Q-distribution parameters. This work is the first to explicitly embed adaptive overestimation control into the DRL paradigm—achieving simplicity, broad compatibility, and plug-and-play integration with mainstream DRL algorithms (e.g., C51, QR-DQN, IQN) via minimal code modifications. Empirical evaluation across tabular environments, Atari 2600, and MuJoCo benchmarks demonstrates substantial improvements in convergence speed and policy stability, validating both theoretical soundness and generalizability.
📝 Abstract
Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We propose an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework is simple to implement, existing distributional algorithms can be improved with a few lines of code. We provide theoretical evidence and use double $Q$-learning to show how to include locally adaptive overestimation control in existing algorithms. Experiments are provided for tabular, Atari, and MuJoCo environments.