ADDQ: Adaptive Distributional Double Q-Learning

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Overestimation bias in Q-value estimation critically impedes convergence speed in Q-learning and Actor-Critic algorithms, yet existing deep reinforcement learning (DRL) methods lack adaptive mechanisms to mitigate it. To address this, we propose a local adaptive overestimation suppression framework for distributional RL (DRL), built upon a double-Q architecture and augmented with a dynamic error correction module that enables online, localized adjustment of Q-distribution parameters. This work is the first to explicitly embed adaptive overestimation control into the DRL paradigm—achieving simplicity, broad compatibility, and plug-and-play integration with mainstream DRL algorithms (e.g., C51, QR-DQN, IQN) via minimal code modifications. Empirical evaluation across tabular environments, Atari 2600, and MuJoCo benchmarks demonstrates substantial improvements in convergence speed and policy stability, validating both theoretical soundness and generalizability.

Technology Category

Application Category

📝 Abstract

Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We propose an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework is simple to implement, existing distributional algorithms can be improved with a few lines of code. We provide theoretical evidence and use double $Q$-learning to show how to include locally adaptive overestimation control in existing algorithms. Experiments are provided for tabular, Atari, and MuJoCo environments.

Problem

Research questions and friction points this paper is trying to address.

Address Q-value overestimation in reinforcement learning

Propose adaptive method for distributional RL algorithms

Improve existing algorithms with simple code modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive distributional double Q-learning method

Locally adaptive overestimation control

Simple implementation with few code changes

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning