Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address pervasive overestimation bias, low sample efficiency, and training instability in Deep Q-Networks (DQNs), this paper proposes Elastic Ensemble Deep Q-Network (EEDQN). EEDQN is the first method to systematically integrate elastic step-size updates with multi-step TD returns within a Q-value ensemble framework, dynamically adapting the bootstrapping horizon to jointly mitigate overestimation and improve sample utilization. Technically, it unifies DQN, multi-step TD learning, Q-value ensembling, and adaptive step-size updating. Evaluated across the full MinAtar benchmark suite, EEDQN achieves significantly higher final returns than standard DQN and matches or surpasses state-of-the-art ensemble DQN variants. Moreover, it demonstrates superior training stability and generalization robustness.

Technology Category

Application Category

📝 Abstract

While many algorithmic extensions to Deep Q-Networks (DQN) have been proposed, there remains limited understanding of how different improvements interact. In particular, multi-step and ensemble style extensions have shown promise in reducing overestimation bias, thereby improving sample efficiency and algorithmic stability. In this paper, we introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance. EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency. We evaluated EEDQN against standard and ensemble DQN variants across the MinAtar benchmark, a set of environments that emphasise behavioral learning while reducing representational complexity. Our results show that EEDQN achieves consistently robust performance across all tested environments, outperforming baseline DQN methods and matching or exceeding state-of-the-art ensemble DQNs in final returns on most of the MinAtar environments. These findings highlight the potential of systematically combining algorithmic improvements and provide evidence that ensemble and multi-step methods, when carefully integrated, can yield substantial gains.

Problem

Research questions and friction points this paper is trying to address.

Addresses overestimation bias in deep reinforcement learning

Improves sample efficiency in value-based RL algorithms

Combines ensemble and multi-step methods for stable performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines ensembles with elastic step updates

Reduces overestimation bias in DQN

Improves sample efficiency and stability

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning