Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This work addresses the challenge of effectively transferring reward functions learned from expert demonstrations in a source environment to a target reinforcement learning setting. To this end, the authors propose a coupled modeling approach that jointly constructs Bellman equations for both source and target domains, enabling a minimax estimation framework to directly solve for the soft Q-functions in both domains simultaneously. This circumvents the error propagation inherent in sequential estimation procedures. Theoretical analysis demonstrates that the method eliminates the first-order influence of source-domain Bellman residuals on the target policy and establishes finite-sample error bounds for the soft Q-functions as well as policy regret bounds. Empirical evaluation on a sepsis simulator shows that the proposed method outperforms conventional sequential transfer strategies.

📝 Abstract

We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controlled environment. We formulate the problem as a joint system of Bellman equations across the source and target environments and develop minimax estimators for the target soft-$q$-function. Whereas a sequential solution approach first estimates the source reward and then plugs it into the target control problem, a coupled approach solves the source and target system of equations jointly. We show that, in contrast to the sequential approach, the coupled approach removes the first-order influence of source Bellman residual error. We characterize the local behavior of each approach, develop finite-sample soft-$q$-function error bounds, and prove regret guarantees for the resulting soft-control policy. An empirical investigation using a sepsis simulator validates the theoretical comparison.

Problem

Research questions and friction points this paper is trying to address.

reward transfer

inverse reinforcement learning

reinforcement learning

environment transfer

Bellman equations

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward transfer

inverse reinforcement learning

coupled minimax

soft-q-function

Bellman equations

🔎 Similar Papers

Multi Task Inverse Reinforcement Learning for Common Sense Reward

2024-02-17arXiv.orgCitations: 0

On Reward Transferability in Adversarial Inverse Reinforcement Learning: Insights from Random Matrix Theory

2024-10-10Citations: 0