Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a gap in the theoretical understanding of near-minimax optimal reinforcement learning algorithms under gap-dependent settings. Focusing on the linear function approximation framework, we establish the first gap-dependent regret bound for the LSVI-UCB++ algorithm, significantly improving the dependence on both the feature dimension \(d\) and the horizon length \(H\). Furthermore, we propose a multi-agent parallelized variant of the algorithm that leverages its low policy-switching property to enable efficient exploration. This extension achieves the first sample complexity upper bound with linear speedup in the number of agents for online multi-agent reinforcement learning, outperforming all prior results in this setting.

Technology Category

Application Category

📝 Abstract
We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $\tilde{O}(d\sqrt{H^3K})$, where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the first gap-dependent sample complexity upper bound for online multi-agent RL with linear function approximation, achieving linear speedup with respect to the number of agents.
Problem

Research questions and friction points this paper is trying to address.

gap-dependent regret
nearly minimax-optimal
linear function approximation
multi-agent reinforcement learning
sample complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

gap-dependent regret
nearly minimax-optimal
linear function approximation
multi-agent reinforcement learning
low policy switching
🔎 Similar Papers
No similar papers found.