Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper investigates the finite-sample statistical convergence rate of distributional temporal difference (TD) learning with linear function approximation for estimating the return distribution under a fixed policy π in discounted Markov decision processes. Unlike prior works focused on tabular settings, we establish the first tight finite-sample upper bound for linear distributional TD, proving its sample complexity matches that of classical linear TD learning—demonstrating that distributional learning is statistically no less efficient than expectation-based learning. Technically, our analysis integrates the linear-categorical Bellman equation, exponential stability theory for products of random matrices, and cross-methods bridging distributional reinforcement learning and statistical learning theory. The resulting bound achieves optimal statistical rate. This work provides the first linearly approximated framework for distributional RL in high-dimensional settings, backed by rigorous theoretical guarantees.

Technology Category

Application Category

📝 Abstract

In this paper, we investigate the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy {pi}. Prior works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of the classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return using streaming data is no more difficult than learning its expectation (i.e. the value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation, and employ the exponential stability arguments for products of random matrices. Our findings provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

Problem

Research questions and friction points this paper is trying to address.

Analyzes finite-sample rates of distributional TD learning.

Focuses on linear function approximation for return distribution.

Compares complexity with classic linear TD learning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear function approximation

Exponential stability arguments

Sharp finite-sample rates

🔎 Similar Papers

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning