๐ค AI Summary
This work addresses the unreliability of learning planning behaviors from ambiguous and incomplete human demonstrations. We propose an end-to-end imitation learning framework that unifies inverse reinforcement learning (IRL) with model predictive control (MPC). Its core innovation lies in replacing conventional black-box policies with an interpretable, planning-based policyโthereby bridging adversarial imitation learning and explicit planning paradigms. The method models the reward function and generates robust planning trajectories from only a few (even a single) demonstration(s), achieving high sample efficiency while significantly improving out-of-distribution generalization and system robustness. We validate its effectiveness on both simulated control benchmarks and real-world navigation tasks. Our approach offers a novel pathway toward safe, interpretable, and data-efficient autonomous agent learning.
๐ Abstract
Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.