Model Predictive Adversarial Imitation Learning for Planning from Observation

πŸ“… 2025-07-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
This work addresses the unreliability of learning planning behaviors from ambiguous and incomplete human demonstrations. We propose an end-to-end imitation learning framework that unifies inverse reinforcement learning (IRL) with model predictive control (MPC). Its core innovation lies in replacing conventional black-box policies with an interpretable, planning-based policyβ€”thereby bridging adversarial imitation learning and explicit planning paradigms. The method models the reward function and generates robust planning trajectories from only a few (even a single) demonstration(s), achieving high sample efficiency while significantly improving out-of-distribution generalization and system robustness. We validate its effectiveness on both simulated control benchmarks and real-world navigation tasks. Our approach offers a novel pathway toward safe, interpretable, and data-efficient autonomous agent learning.

Technology Category

Application Category

πŸ“ Abstract
Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous and incomplete human demonstration data
Unifies reward learning and planning in imitation learning
Improves sample efficiency and generalization in planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Predictive Adversarial Imitation Learning
Planning-based agent replaces IRL policy
End-to-end interactive observation-only learning
πŸ”Ž Similar Papers
No similar papers found.