🤖 AI Summary
This paper investigates the minimal sample complexity required to learn a near-optimal policy in reward-free kernel-based reinforcement learning. For broad classes of kernel functions, we establish the first tight lower bound on sample complexity and propose a novel kernel ridge regression confidence interval construction specifically tailored to RL, significantly improving statistical efficiency. Unlike prior work, our approach relaxes strong assumptions on the generative model and explicitly quantifies the error amplification cost induced by H-step unrolling. We prove that the resulting algorithm achieves near-optimal sample complexity with a simpler architecture. Empirical simulations confirm its rapid convergence to a near-optimal policy under limited interaction. Key contributions include: (i) the first tight sample-complexity lower bound across multiple kernel classes; (ii) an RL-specific confidence set design leveraging kernel ridge regression; and (iii) provably robust performance guarantees under weakened modeling assumptions.
📝 Abstract
Reinforcement Learning (RL) problems are being considered under increasingly more complex structures. While tabular and linear models have been thoroughly explored, the analytical study of RL under nonlinear function approximation, especially kernel-based models, has recently gained traction for their strong representational capacity and theoretical tractability. In this context, we examine the question of statistical efficiency in kernel-based RL within the reward-free RL framework, specifically asking: how many samples are required to design a near-optimal policy? Existing work addresses this question under restrictive assumptions about the class of kernel functions. We first explore this question by assuming a generative model, then relax this assumption at the cost of increasing the sample complexity by a factor of H, the length of the episode. We tackle this fundamental problem using a broad class of kernels and a simpler algorithm compared to prior work. Our approach derives new confidence intervals for kernel ridge regression, specific to our RL setting, which may be of broader applicability. We further validate our theoretical findings through simulations.