ADDQ: Adaptive Distributional Double Q-Learning

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Overestimation bias in Q-value estimation critically impedes convergence speed in Q-learning and Actor-Critic algorithms, yet existing deep reinforcement learning (DRL) methods lack adaptive mechanisms to mitigate it. To address this, we propose a local adaptive overestimation suppression framework for distributional RL (DRL), built upon a double-Q architecture and augmented with a dynamic error correction module that enables online, localized adjustment of Q-distribution parameters. This work is the first to explicitly embed adaptive overestimation control into the DRL paradigm—achieving simplicity, broad compatibility, and plug-and-play integration with mainstream DRL algorithms (e.g., C51, QR-DQN, IQN) via minimal code modifications. Empirical evaluation across tabular environments, Atari 2600, and MuJoCo benchmarks demonstrates substantial improvements in convergence speed and policy stability, validating both theoretical soundness and generalizability.

Technology Category

Application Category

📝 Abstract
Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We propose an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework is simple to implement, existing distributional algorithms can be improved with a few lines of code. We provide theoretical evidence and use double $Q$-learning to show how to include locally adaptive overestimation control in existing algorithms. Experiments are provided for tabular, Atari, and MuJoCo environments.
Problem

Research questions and friction points this paper is trying to address.

Address Q-value overestimation in reinforcement learning
Propose adaptive method for distributional RL algorithms
Improve existing algorithms with simple code modifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive distributional double Q-learning method
Locally adaptive overestimation control
Simple implementation with few code changes
🔎 Similar Papers
2024-05-23Trans. Mach. Learn. Res.Citations: 0
Leif Döring
Leif Döring
Professor, University of Mannheim
stochastic process theoryreinforcement learning
Benedikt Wille
Benedikt Wille
PhD student at University of Mannheim
Reinforcement Learning
M
Maximilian Birr
Institute of Mathematics, University of Mannheim, Germany
M
Mihail Bîrsan
Department of Mathematics and Computer Science, Freie Universit¨at Berlin, Germany
Martin Slowik
Martin Slowik
Institute of Mathematics, University of Mannheim, Germany