Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

📅 2024-06-17

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

237K/year

🤖 AI Summary

This work investigates whether computationally efficient reinforcement learning algorithms exist for Markov decision processes (MDPs) with deterministic dynamics, large action spaces, stochastic initial states, and stochastic rewards, under the linear Bellman completeness framework. To address error amplification—a key challenge in value estimation—we propose the first computationally efficient (polynomial-time) optimistic value iteration algorithm: it injects structured random noise *only* into the null space of the training data during least-squares regression, yielding strictly optimistic value estimates without excessive conservatism. Our method integrates linear function approximation with optimistic value iteration. Theoretically, it achieves a regret bound of $ ilde{O}(sqrt{d^3 H^3 T})$, where $d$ is the feature dimension, $H$ the horizon, and $T$ the total time steps. This bound unifies classical settings—including linear MDPs and linear quadratic regulators (LQR)—and breaks computational bottlenecks in large-action-space regimes under statistical learnability assumptions.

Technology Category

Application Category

📝 Abstract

We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting. This setting uses linear function approximation to capture value functions and unifies existing models like linear Markov Decision Processes (MDP) and Linear Quadratic Regulators (LQR). While it is known from the prior works that this setting is statistically tractable, it remained open whether a computationally efficient algorithm exists. Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic. Our approach is based on randomization: we inject random noise into least squares regression problems to perform optimistic value iteration. Our key technical contribution is to carefully design the noise to only act in the null space of the training data to ensure optimism while circumventing a subtle error amplification issue.

Problem

Research questions and friction points this paper is trying to address.

Develop efficient RL algorithms for linear Bellman Complete settings.

Address computational efficiency in large action space MDPs.

Ensure optimism in value iteration via randomized noise injection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random noise injection in least squares regression

Optimistic value iteration for deterministic dynamics

Null space noise design to prevent error amplification

🔎 Similar Papers

No similar papers found.