Multi Task Inverse Reinforcement Learning for Common Sense Reward

📅 2024-02-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In complex environments, hand-crafted reward functions often lead to reward hacking due to misalignment between the designed reward and true task objectives. Method: This paper proposes decoupling the reward function into a task-specific component and an implicit commonsense component, and introduces the first multi-task inverse reinforcement learning (MT-IRL) framework explicitly designed for commonsense reward transfer. Theoretical analysis shows that single-task IRL cannot learn transferable rewards; our approach jointly models expert demonstrations across multiple tasks to disentangle and reuse the commonsense reward component. Contribution/Results: We formally define and characterize the transferability of commonsense rewards, establishing a novel MT-IRL paradigm that enables both reward disentanglement and cross-task reuse. Experiments demonstrate that reusing the learned commonsense reward in novel tasks significantly improves policy generalization and behavioral stability while effectively mitigating reward hacking. This work provides the first principled framework for transferring latent, task-agnostic reward structure across diverse sequential decision-making tasks.

Technology Category

Application Category

📝 Abstract
One of the challenges in applying reinforcement learning in a complex real-world environment lies in providing the agent with a sufficiently detailed reward function. Any misalignment between the reward and the desired behavior can result in unwanted outcomes. This may lead to issues like"reward hacking"where the agent maximizes rewards by unintended behavior. In this work, we propose to disentangle the reward into two distinct parts. A simple task-specific reward, outlining the particulars of the task at hand, and an unknown common-sense reward, indicating the expected behavior of the agent within the environment. We then explore how this common-sense reward can be learned from expert demonstrations. We first show that inverse reinforcement learning, even when it succeeds in training an agent, does not learn a useful reward function. That is, training a new agent with the learned reward does not impair the desired behaviors. We then demonstrate that this problem can be solved by training simultaneously on multiple tasks. That is, multi-task inverse reinforcement learning can be applied to learn a useful reward function.
Problem

Research questions and friction points this paper is trying to address.

Learning common-sense rewards from expert demonstrations
Preventing reward hacking in complex reinforcement learning
Disentangling task-specific and common-sense reward components
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangling reward into task-specific and common-sense components
Learning common-sense rewards from expert demonstrations via inverse reinforcement
Applying multi-task inverse reinforcement learning for useful rewards
🔎 Similar Papers
No similar papers found.