On the Divergence of Differential Temporal Difference Learning without Local Clocks

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This study addresses an open problem posed by Wan et al. concerning the convergence behavior of the Differential Temporal Difference (DTD) algorithm in average-reward reinforcement learning under global versus local clock schedules. We establish, for the first time, that these two clock mechanisms are not equivalent in this setting. By constructing a specific Markov decision process as a counterexample, we demonstrate that DTD converges under a local clock based on state visitation counts but diverges under a global clock. This result clarifies the critical role of learning rate scheduling in algorithmic stability and underscores the necessity of employing a local clock for reliable convergence in average-reward learning scenarios.
📝 Abstract
Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $α_t$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $α_{ν(S_t, t)}$, where $ν(s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also convergent with a global clock, and vice versa. We are not aware of any counterexample. The key contribution of this work is to show that this nice correspondence breaks down in average-reward RL. Specifically, we construct a counterexample showing that although differential temporal difference learning is convergent with a local clock, it can diverge with a global clock. This counterexample closes the open problem in Wan et al. [2021], Blaser et al. [2026].
Problem

Research questions and friction points this paper is trying to address.

differential temporal difference learning
average-reward reinforcement learning
global clock
local clock
convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

differential temporal difference learning
global clock
local clock
average-reward reinforcement learning
convergence divergence
🔎 Similar Papers