Transferred Q-learning

📅 2022-02-09

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the challenge of policy learning under scarce target-task samples in high-dimensional, non-stationary, finite-horizon MDPs, this paper proposes an offline transfer reinforcement learning framework tailored for Q-function learning. Methodologically, it introduces a novel re-targeting mechanism to enable vertical multi-step information cascading transfer; integrates heterogeneous offline data sources; supports both batch and online learning modes; and models source–target relationships via a similarity assumption. Theoretically, it establishes, for the first time, rigorous guarantees on accelerated Q-function convergence rates under offline transfer, as well as upper bounds on regret reduction in offline-to-online transfer settings. Empirical evaluations on synthetic benchmarks and real-world clinical datasets demonstrate that the proposed method significantly outperforms standard Q-learning and state-of-the-art transfer RL baselines.

📝 Abstract

We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks. We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies. The proposed transferred $Q$-learning algorithm contains a novel re-targeting step that enables vertical information-cascading along multiple steps in an RL task, besides the usual horizontal information-gathering as transfer learning (TL) for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q$ function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under certain similarity assumptions. Empirical evidences from both synthetic and real datasets are presented to back up the proposed algorithm and our theoretical results.

Problem

Research questions and friction points this paper is trying to address.

Addresses high-dimensional state spaces in time-inhomogeneous MDPs

Overcomes limited sample availability through cross-task knowledge transfer

Enables cross-stage transfer in finite-horizon reinforcement learning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer Q-learning for time-inhomogeneous MDPs

Re-targeting step enables cross-stage knowledge transfer

Theoretical guarantees for convergence and regret bounds

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning