End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation of existing efficient reinforcement learning algorithms in linearly Bellman-complete Markov decision processes (MDPs), which typically require either small action spaces or strong feature-based oracle assumptions. Focusing on such MDPs with deterministic transitions and stochastic initial states and rewards, the paper proposes the first end-to-end provably efficient algorithm that operates under the standard argmax action oracle and accommodates both finite and infinite action spaces. By integrating linear function approximation, the Bellman completeness assumption, and polynomial-time policy optimization, the method achieves sample and computational complexities that are polynomial in the horizon, feature dimension, and $1/\varepsilon$, thereby learning an $\varepsilon$-optimal policy.

Technology Category

Application Category

📝 Abstract
We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains linear. While statistically tractable, prior computationally efficient algorithms are either limited to small action spaces or require strong oracle assumptions over the feature space. We provide a computationally efficient algorithm for linear Bellman complete MDPs with \emph{deterministic transitions}, stochastic initial states, and stochastic rewards. For finite action spaces, our algorithm is end-to-end efficient; for large or infinite action spaces, we require only a standard argmax oracle over actions. Our algorithm learns an $\varepsilon$-optimal policy with sample and computational complexity polynomial in the horizon, feature dimension, and $1/\varepsilon$.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
linear Bellman completeness
deterministic transitions
efficient RL
function approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

linear Bellman completeness
deterministic transitions
end-to-end efficient RL
function approximation
argmax oracle
🔎 Similar Papers
No similar papers found.