Improved Model-based Reinforcement Learning with Smooth Kernels

πŸ“… 2026-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

196K/year
πŸ€– AI Summary
This work addresses the limited expressivity of existing model-based reinforcement learning methods in continuous state-action spaces, which often rely on restrictive low-rank MDP assumptions. The authors propose a nonparametric model-based algorithm grounded in kernel smoothing that leverages the Lipschitz continuity of the underlying MDP. By integrating a Bernstein-type exploration bonus into finite-horizon online learning, the method achieves tighter regret bounds than prior state-of-the-art approaches, particularly improving dependence on the planning horizon. Notably, this is the first successful fusion of Bernstein-style exploration with kernel smoothing, supported by a novel martingale concentration inequality. The resulting theoretical advances offer independent value beyond the specific algorithmic framework.
πŸ“ Abstract
For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothing model-based approach for online reinforcement learning in finite-horizon settings under Lipschitz continuity assumptions on the MDP. By incorporating a Bernstein-style exploration bonus into the kernel smoothing framework, our method achieves a regret bound which improves upon the state-of-the-art regret bound in its dependence on the horizon. The theoretical advancement relies on a delicate analysis of the synergy between Bernstein-style bonuses and kernel smoothing, where a new tight Bernstein-type concentration inequality for martingales may be of independent interest.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
continuous state-action space
Markov decision processes
kernel smoothing
regret bound
Innovation

Methods, ideas, or system contributions that make the work stand out.

kernel smoothing
model-based reinforcement learning
Bernstein bonus
regret bound
Lipschitz MDP
πŸ”Ž Similar Papers
No similar papers found.