Regularized Gradient Temporal-Difference Learning

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability of gradient temporal difference (GTD) learning when the feature interaction matrix (FIM) is singular. To resolve this issue, the authors propose a regularized GTD algorithm (R-GTD), which introduces a regularization term into the mean squared projected Bellman error (MSPBE) objective. This approach provides, for the first time within the GTD framework, a robust solution that remains stable even when the FIM is non-invertible, guaranteeing convergence to a unique solution. Theoretical analysis establishes the convergence of R-GTD and derives an explicit error bound. Empirical results further demonstrate the superior stability and effectiveness of the proposed method in scenarios involving singular FIMs.

Technology Category

Application Category

📝 Abstract
Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. In this paper, we propose a regularized optimization objective by reformulating the mean-square projected Bellman error (MSPBE) minimization. This formulation naturally yields a regularized GTD algorithms, referred to as R-GTD, which guarantees convergence to a unique solution even when the FIM is singular. We establish theoretical convergence guarantees and explicit error bounds for the proposed method, and validate its effectiveness through empirical experiments.
Problem

Research questions and friction points this paper is trying to address.

off-policy policy evaluation
feature interaction matrix
singular matrix
gradient temporal-difference learning
convergence instability
Innovation

Methods, ideas, or system contributions that make the work stand out.

regularized GTD
singular FIM
MSPBE minimization
off-policy evaluation
convergence guarantee
🔎 Similar Papers
No similar papers found.
H
Hyunjun Na
Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, South Korea
Donghwan Lee
Donghwan Lee
KAIST
Decision makingcontroland optimization