Regularized Q-Learning with Linear Function Approximation

📅 2024-01-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the convergence challenge of regularized Q-learning under linear function approximation—specifically, the absence of theoretical guarantees due to the non-contractivity (under any norm) of the composite mapping formed by the regularized Bellman operator and the projection onto basis functions. We propose the first single-loop, two-timescale stochastic approximation algorithm for this setting. Methodologically, we formulate a bilevel optimization framework: the inner level enforces exact satisfaction of the Bellman optimality condition, while the outer level performs orthogonal projection onto the linear feature space. Theoretically, we break the long-standing non-contractivity barrier, establishing the first finite-time convergence guarantee to stationary points. Moreover, under Markovian noise, we derive a performance bound for the induced policy. This work provides foundational theoretical support for stability and interpretability of regularized reinforcement learning with linear function approximation.

Technology Category

Application Category

📝 Abstract
Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The {em lower} level optimization problem aims to identify a value function approximation that satisfies Bellman's recursive optimality condition and the {em upper} level aims to find the projection onto the span of basis vectors. This formulation motivates a single-loop algorithm with finite time convergence guarantees. The algorithm operates on two time-scales: updates to the projection of state-action values are `slow' in that they are implemented with a step size that is smaller than the one used for `faster' updates of approximate solutions to Bellman's recursive optimality equation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
Problem

Research questions and friction points this paper is trying to address.

Convergence of regularized Q-learning with approximation
Bi-level optimization for value function projection
Algorithm guarantees under Markovian noise conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regularized Q-learning with linear approximation
Bi-level optimization for value function projection
Single-loop algorithm with finite time convergence
🔎 Similar Papers
No similar papers found.
J
Jiachen Xi
Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843
Alfredo Garcia
Alfredo Garcia
Texas A&M University
dynamic optimizationgame theorydynamic games
P
P. Momcilovic
Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843