Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In value-based reinforcement learning, eliminating target networks reduces memory consumption and increases online network capacity but often destabilizes training and severs ties with established theoretical foundations and methodologies. To address this, we propose iterative Shared Q-Learning (iS-QL), which breaks the binary “all-or-nothing” target network paradigm by introducing a novel architecture combining *partial parameter sharing* with *multi-step iterative Q-learning*. Specifically, iS-QL reuses the most recent linear layer of the online network as a lightweight, dynamically updated target—unifying target-driven stability with online adaptability. The method enables parallel multi-step Bellman error estimation and decouples online and target parameter updates. Empirical results demonstrate that iS-QL substantially narrows the performance gap between target-free methods and target-network baselines, achieving higher sample efficiency, lower memory footprint, and comparable training time—while preserving theoretical rigor and strong scalability.

Technology Category

Application Category

📝 Abstract

In value-based reinforcement learning, removing the target network is tempting as the boostrapped target would be built from up-to-date estimates, and the spared memory occupied by the target network could be reallocated to expand the capacity of the online network. However, eliminating the target network introduces instability, leading to a decline in performance. Removing the target network also means we cannot leverage the literature developed around target networks. In this work, we propose to use a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network, hence stepping out of the binary choice between target-based and target-free methods. It enables us to leverage the concept of iterated Q-learning, which consists of learning consecutive Bellman iterations in parallel, to reduce the performance gap between target-free and target-based approaches. Our findings demonstrate that this novel method, termed iterated Shared Q-Learning (iS-QL), improves the sample efficiency of target-free approaches across various settings. Importantly, iS-QL requires a smaller memory footprint and comparable training time to classical target-based algorithms, highlighting its potential to scale reinforcement learning research.

Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap between target-free and target-based RL

Reducing instability from removing target networks in RL

Improving sample efficiency with shared Q-learning approach

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses last layer copy as target network

Shares remaining parameters with online network

Leverages iterated Q-learning for efficiency

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning