Planning in entropy-regularized Markov decision processes and games

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the lack of polynomial sample complexity guarantees in planning for unregularized Markov decision processes (MDPs) and two-player zero-sum games by proposing the SmoothCruiser algorithm. Leveraging the smoothness induced by entropy regularization in the associated Bellman operators, SmoothCruiser achieves efficient planning under a generative model. It establishes the first problem-independent sample complexity bound of $\widetilde{O}(1/\varepsilon^4)$ for entropy-regularized MDPs and games, thereby overcoming a fundamental theoretical barrier present in the unregularized setting. This result provides the first planning method for sequential decision-making problems with provable polynomial sample complexity guarantees.

Technology Category

Application Category

📝 Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Problem

Research questions and friction points this paper is trying to address.

entropy-regularized

Markov decision processes

two-player games

value function estimation

sample complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy regularization

SmoothCruiser

Bellman operator smoothness