Fitted Q-Iteration via Max-Plus-Linear Approximation

📅 2024-09-12

🏛️ IEEE Control Systems Letters

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address poor convergence and high computational complexity in Q-function learning for offline reinforcement learning, this paper introduces max-plus linear (MPL) approximation into the fitted Q-iteration (FQI) framework—the first such integration. Leveraging the intrinsic compatibility between the Bellman operator and max-plus algebra, we propose MPL-FQI, a theoretically guaranteed convergent algorithm wherein each iteration reduces to an efficient max-plus matrix-vector multiplication with sample-size-independent time complexity. Furthermore, we design a variational variant achieving constant-time complexity per iteration. By preserving rigorous convergence properties while drastically reducing computational overhead, our approach establishes a novel paradigm for offline policy learning in high-dimensional robotic decision-making—uniquely combining theoretical soundness with practical efficiency.

Technology Category

Application Category

📝 Abstract

In this study, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.

Problem

Research questions and friction points this paper is trying to address.

Offline Reinforcement Learning

Q-function Optimization

Robot Decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fitted Q Iteration

Convergence Proof

Computation Efficiency

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning