Model Predictive Adversarial Imitation Learning for Planning from Observation

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the unreliability of learning planning behaviors from ambiguous and incomplete human demonstrations. We propose an end-to-end imitation learning framework that unifies inverse reinforcement learning (IRL) with model predictive control (MPC). Its core innovation lies in replacing conventional black-box policies with an interpretable, planning-based policy—thereby bridging adversarial imitation learning and explicit planning paradigms. The method models the reward function and generates robust planning trajectories from only a few (even a single) demonstration(s), achieving high sample efficiency while significantly improving out-of-distribution generalization and system robustness. We validate its effectiveness on both simulated control benchmarks and real-world navigation tasks. Our approach offers a novel pathway toward safe, interpretable, and data-efficient autonomous agent learning.

Technology Category

Application Category

📝 Abstract

Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.

Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous and incomplete human demonstration data

Unifies reward learning and planning in imitation learning

Improves sample efficiency and generalization in planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Predictive Adversarial Imitation Learning

Planning-based agent replaces IRL policy

End-to-end interactive observation-only learning

🔎 Similar Papers

No similar papers found.