Regularized Gradient Temporal-Difference Learning

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the instability of gradient temporal difference (GTD) learning when the feature interaction matrix (FIM) is singular. To resolve this issue, the authors propose a regularized GTD algorithm (R-GTD), which introduces a regularization term into the mean squared projected Bellman error (MSPBE) objective. This approach provides, for the first time within the GTD framework, a robust solution that remains stable even when the FIM is non-invertible, guaranteeing convergence to a unique solution. Theoretical analysis establishes the convergence of R-GTD and derives an explicit error bound. Empirical results further demonstrate the superior stability and effectiveness of the proposed method in scenarios involving singular FIMs.

Technology Category

Application Category

📝 Abstract

Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. In this paper, we propose a regularized optimization objective by reformulating the mean-square projected Bellman error (MSPBE) minimization. This formulation naturally yields a regularized GTD algorithms, referred to as R-GTD, which guarantees convergence to a unique solution even when the FIM is singular. We establish theoretical convergence guarantees and explicit error bounds for the proposed method, and validate its effectiveness through empirical experiments.

Problem

Research questions and friction points this paper is trying to address.

off-policy policy evaluation

feature interaction matrix

singular matrix

gradient temporal-difference learning

convergence instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

regularized GTD

singular FIM

MSPBE minimization

off-policy evaluation

convergence guarantee

🔎 Similar Papers

No similar papers found.

Authors to Follow