Is Optimal Transport Necessary for Inverse Reinforcement Learning?

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work challenges the theoretical necessity of optimal transport (OT) for reward function recovery in inverse reinforcement learning (IRL). We propose two optimization-free, linear-time reward construction methods: (i) minimum-distance reward, grounded in heuristic state distances; and (ii) segment-matching reward, based on trajectory segment alignment. Both rely solely on elementary nearest-neighbor alignment—bypassing computationally expensive OT couplings—thereby enhancing computational efficiency and interpretability. Evaluated across 32 online and offline benchmark tasks, our approaches match or surpass state-of-the-art OT-based IRL methods in performance, demonstrating the feasibility of a lightweight, robust, and generalizable IRL paradigm. Our core contribution is the theoretical and empirical demonstration that OT is not indispensable for IRL, and the establishment of a concise, efficient, and analytically tractable alternative framework grounded in local alignment principles.

Technology Category

Application Category

📝 Abstract

Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations. Recently, Optimal Transport (OT) methods have been successfully deployed to align trajectories and infer rewards. While OT-based methods have shown strong empirical results, they introduce algorithmic complexity, hyperparameter sensitivity, and require solving the OT optimization problems. In this work, we challenge the necessity of OT in IRL by proposing two simple, heuristic alternatives: (1) Minimum-Distance Reward, which assigns rewards based on the nearest expert state regardless of temporal order; and (2) Segment-Matching Reward, which incorporates lightweight temporal alignment by matching agent states to corresponding segments in the expert trajectory. These methods avoid optimization, exhibit linear-time complexity, and are easy to implement. Through extensive evaluations across 32 online and offline benchmarks with three reinforcement learning algorithms, we show that our simple rewards match or outperform recent OT-based approaches. Our findings suggest that the core benefits of OT may arise from basic proximity alignment rather than its optimal coupling formulation, advocating for reevaluation of complexity in future IRL design.

Problem

Research questions and friction points this paper is trying to address.

Challenges necessity of Optimal Transport in Inverse Reinforcement Learning

Proposes simpler heuristic alternatives to OT-based reward inference

Demonstrates that basic proximity alignment matches OT performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimum-Distance Reward for proximity-based alignment

Segment-Matching Reward with lightweight temporal alignment

Linear-time complexity without OT optimization

🔎 Similar Papers

No similar papers found.