Fast Rates for Inverse Reinforcement Learning

πŸ“… 2026-05-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

219K/year
πŸ€– AI Summary
This work addresses inverse reinforcement learning (IRL) for finite-horizon Markov decision processes under linear reward classes, proposing an entropy-regularized minimax IRL framework that enables efficient learning without requiring exploratory assumptions and while accommodating model misspecification. By establishing the equivalence between maximum likelihood estimation and minimax IRLβ€”both in population and empirical formsβ€”and introducing a pseudo-self-consistency analysis of the loss function, the paper proves, for the first time in general Borel state-action spaces, that with only $n$ expert trajectories, both trajectory-level KL divergence and parameter estimation error converge at an $O(n^{-1})$ rate under the Hessian norm, substantially improving upon the classical $O(n^{-1/2})$ rate. The study also extends reward identifiability theory and derives a novel expression for the derivative of the soft optimal value function with respect to reward parameters.
πŸ“ Abstract
We establish novel structural and statistical results for entropy-regularized min-max inverse reinforcement learning (Min-Max-IRL) with linear reward classes in finite-horizon MDPs with Borel state and action spaces. On the structural side, we show that maximum likelihood estimation (MLE) and Min-Max-IRL are equivalent at the population level, and at the empirical level under deterministic dynamics. On the statistical side, exploiting pseudo-self-concordance of the Min-Max-IRL loss, we prove that both the trajectory-level KL divergence and the squared parameter error in the Hessian norm decay at the fast rate $\mathcal{O}(n^{-1})$, where $n$ is the number of expert trajectories. Our guarantees apply under misspecification and require no exploration assumptions. We further extend reward-identifiability results to general Borel spaces and derive novel results on the derivatives of the soft-optimal value function with respect to reward parameters.
Problem

Research questions and friction points this paper is trying to address.

inverse reinforcement learning
entropy regularization
fast rates
reward identifiability
Borel spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

inverse reinforcement learning
fast convergence rate
entropy regularization
reward identifiability
pseudo-self-concordance
πŸ”Ž Similar Papers
No similar papers found.