Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the slow convergence and lack of theoretical stability guarantees in temporal difference (TD) learning with function approximation when the discount factor is close to one. Focusing on TD learning with linear function approximation, the study investigates a variant that incorporates a baseline-adjusted update rule and demonstrates that the choice of baseline distribution critically influences algorithmic stability. The authors prove that using the state-action visitation distribution as the baseline ensures stability for any non-negative weighting scheme and arbitrary discount factor. Furthermore, they establish, for the first time, uniform boundedness of the asymptotic bias and covariance of the parameter estimates in high-discount settings, thereby providing rigorous theoretical foundations for practical reinforcement learning applications.

Technology Category

Application Category

📝 Abstract

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particular, when the baseline is chosen as the empirical distribution of the state-action process, the algorithm is stable for any non-negative baseline weight and any discount factor. We also provide a sensitivity analysis of the resulting parameter estimates, characterizing both asymptotic bias and covariance. The asymptotic covariance and asymptotic bias are shown to remain uniformly bounded as the discount factor approaches one.

Problem

Research questions and friction points this paper is trying to address.

relative temporal-difference learning

stability

function approximation

discount factor

sensitivity analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

relative temporal-difference learning

linear function approximation

stability analysis