Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

📅 2024-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In deep reinforcement learning, single-step Bellman updates suffer from substantial value estimation bias and poor sample efficiency. To address this, we propose iterative Q-networks (i-QN), the first framework to learn a cascaded sequence of Q-functions enabling multi-step joint Bellman iteration. i-QN provides theoretical convergence guarantees while remaining compatible with both value-based and actor-critic paradigms. By employing a chain of target networks and end-to-end joint training, i-QN eliminates error accumulation inherent in the projection step of conventional experience replay. Evaluated on Atari 2600 and MuJoCo benchmarks, i-QN achieves significant improvements in both sample efficiency and asymptotic performance. Notably, it constitutes the first systematic empirical validation of multi-step target value updates in high-dimensional continuous control—demonstrating both efficacy and scalability beyond discrete-action domains.

Technology Category

Application Category

📝 Abstract
The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the application of an empirical approximation of the Bellman operator and a subsequent projection step onto a considered function space. It has been observed that this scheme can be potentially generalized to carry out multiple iterations of the Bellman operator at once, benefiting the underlying learning algorithm. However, till now, it has been challenging to effectively implement this idea, especially in high-dimensional problems. In this paper, we introduce iterated $Q$-Network (i-QN), a novel principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions where each serves as the target for the next. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods. We empirically demonstrate the advantages of i-QN in Atari $2600$ games and MuJoCo continuous control problems.
Problem

Research questions and friction points this paper is trying to address.

Enhances action-value function estimation efficiency
Generalizes multiple Bellman operator iterations
Improves performance in high-dimensional reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterated Q-Network enables multiple Bellman updates
Tailored sequence of action-value functions
Seamless use in value-based and actor-critic methods
🔎 Similar Papers
No similar papers found.