Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

πŸ“… 2026-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of effectively transferring reward functions learned from expert demonstrations in a source environment to a target reinforcement learning setting. To this end, the authors propose a coupled modeling approach that jointly constructs Bellman equations for both source and target domains, enabling a minimax estimation framework to directly solve for the soft Q-functions in both domains simultaneously. This circumvents the error propagation inherent in sequential estimation procedures. Theoretical analysis demonstrates that the method eliminates the first-order influence of source-domain Bellman residuals on the target policy and establishes finite-sample error bounds for the soft Q-functions as well as policy regret bounds. Empirical evaluation on a sepsis simulator shows that the proposed method outperforms conventional sequential transfer strategies.
πŸ“ Abstract
We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controlled environment. We formulate the problem as a joint system of Bellman equations across the source and target environments and develop minimax estimators for the target soft-$q$-function. Whereas a sequential solution approach first estimates the source reward and then plugs it into the target control problem, a coupled approach solves the source and target system of equations jointly. We show that, in contrast to the sequential approach, the coupled approach removes the first-order influence of source Bellman residual error. We characterize the local behavior of each approach, develop finite-sample soft-$q$-function error bounds, and prove regret guarantees for the resulting soft-control policy. An empirical investigation using a sepsis simulator validates the theoretical comparison.
Problem

Research questions and friction points this paper is trying to address.

reward transfer
inverse reinforcement learning
reinforcement learning
environment transfer
Bellman equations
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward transfer
inverse reinforcement learning
coupled minimax
soft-q-function
Bellman equations