Model Predictive Adversarial Imitation Learning for Planning from Observation

๐Ÿ“… 2025-07-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the unreliability of learning planning behaviors from ambiguous and incomplete human demonstrations. We propose an end-to-end imitation learning framework that unifies inverse reinforcement learning (IRL) with model predictive control (MPC). Its core innovation lies in replacing conventional black-box policies with an interpretable, planning-based policyโ€”thereby bridging adversarial imitation learning and explicit planning paradigms. The method models the reward function and generates robust planning trajectories from only a few (even a single) demonstration(s), achieving high sample efficiency while significantly improving out-of-distribution generalization and system robustness. We validate its effectiveness on both simulated control benchmarks and real-world navigation tasks. Our approach offers a novel pathway toward safe, interpretable, and data-efficient autonomous agent learning.

Technology Category

Application Category

๐Ÿ“ Abstract
Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous and incomplete human demonstration data
Unifies reward learning and planning in imitation learning
Improves sample efficiency and generalization in planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Predictive Adversarial Imitation Learning
Planning-based agent replaces IRL policy
End-to-end interactive observation-only learning
๐Ÿ”Ž Similar Papers
No similar papers found.
Tyler Han
Tyler Han
Graduate Student, University of Washington
roboticsimitation learningcontrols
Y
Yanda Bao
University of Washington
B
Bhaumik Mehta
University of Washington
G
Gabriel Guo
University of Washington
A
Anubhav Vishwakarma
University of Washington
E
Emily Kang
University of Washington
Sanghun Jung
Sanghun Jung
PhD Student in CSE at University of Washington
RoboticsAutonomous DrivingComputer Vision
Rosario Scalise
Rosario Scalise
University of Washington
Artificial IntelligenceRoboticsMachine LearningOptimal ControlNLP
J
Jason Zhou
University of Washington
B
Bryan Xu
University of Washington
Byron Boots
Byron Boots
Associate Professor, University of Washington
Machine LearningArtificial IntelligenceRobotics