Planning in entropy-regularized Markov decision processes and games

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the lack of polynomial sample complexity guarantees in planning for unregularized Markov decision processes (MDPs) and two-player zero-sum games by proposing the SmoothCruiser algorithm. Leveraging the smoothness induced by entropy regularization in the associated Bellman operators, SmoothCruiser achieves efficient planning under a generative model. It establishes the first problem-independent sample complexity bound of $\widetilde{O}(1/\varepsilon^4)$ for entropy-regularized MDPs and games, thereby overcoming a fundamental theoretical barrier present in the unregularized setting. This result provides the first planning method for sequential decision-making problems with provable polynomial sample complexity guarantees.

Technology Category

Application Category

📝 Abstract
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Problem

Research questions and friction points this paper is trying to address.

entropy-regularized
Markov decision processes
two-player games
value function estimation
sample complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy regularization
SmoothCruiser
Bellman operator smoothness
sample complexity
generative model