Accelerating Multi-Task Temporal Difference Learning under Low-Rank Representation

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Policy evaluation in multi-task reinforcement learning suffers from slow convergence and poor generalization due to high-dimensional, task-specific value function representations. Method: This paper proposes a novel temporal-difference (TD) learning method that exploits the low-rank structure shared across tasks’ value functions. Its core innovation is the explicit integration of truncated singular value decomposition (SVD) into the TD update loop: after each iteration, the value function parameter matrix is projected onto a low-rank subspace via SVD truncation, thereby constraining updates to a shared low-dimensional manifold. Contribution/Results: We establish theoretical convergence guarantees—proving the algorithm retains standard TD’s O(ln t / t) convergence rate. Empirically, when the rank r is small, our method achieves significantly faster convergence and higher estimation accuracy than baseline approaches; performance gains increase as r decreases. This work establishes a principled, theoretically grounded, and empirically effective TD optimization paradigm for low-rank multi-task RL.

Technology Category

Application Category

📝 Abstract

We study policy evaluation problems in multi-task reinforcement learning (RL) under a low-rank representation setting. In this setting, we are given $N$ learning tasks where the corresponding value function of these tasks lie in an $r$-dimensional subspace, with $r<N$. One can apply the classic temporal-difference (TD) learning method for solving these problems where this method learns the value function of each task independently. In this paper, we are interested in understanding whether one can exploit the low-rank structure of the multi-task setting to accelerate the performance of TD learning. To answer this question, we propose a new variant of TD learning method, where we integrate the so-called truncated singular value decomposition step into the update of TD learning. This additional step will enable TD learning to exploit the dominant directions due to the low rank structure to update the iterates, therefore, improving its performance. Our empirical results show that the proposed method significantly outperforms the classic TD learning, where the performance gap increases as the rank $r$ decreases. From the theoretical point of view, introducing the truncated singular value decomposition step into TD learning might cause an instability on the updates. We provide a theoretical result showing that the instability does not happen. Specifically, we prove that the proposed method converges at a rate $mathcal{O}(frac{ln(t)}{t})$, where $t$ is the number of iterations. This rate matches that of the standard TD learning.

Problem

Research questions and friction points this paper is trying to address.

Accelerates TD learning in multi-task RL using low-rank structure.

Proposes TD learning variant with truncated SVD for better performance.

Ensures stability and convergence of the new TD learning method.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates truncated singular value decomposition into TD learning

Exploits low-rank structure to accelerate learning performance

Ensures convergence stability with theoretical proof

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits