🤖 AI Summary
This work addresses safety-constrained imitation learning, aiming to learn a maximum-entropy policy from expert demonstrations while satisfying multiple constraints (e.g., safety requirements and task specifications). We propose a novel probabilistic inference–based imitation learning framework that formulates constraint satisfaction as a KL-divergence–constrained optimization problem, yielding a unified objective accommodating both hard and soft constraints. To jointly optimize policy entropy and constraint violation cost, we introduce a dual gradient descent algorithm with theoretical convergence guarantees. Theoretically, we establish a rigorous equivalence between constrained imitation learning and constrained maximum-entropy reinforcement learning. Empirically, our method achieves significant improvements over state-of-the-art baselines across diverse simulated and real-world robotic tasks, demonstrating superior generalization, behavioral diversity, and robustness to constraint violations.
📝 Abstract
This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert trajectories executing a task. The formulation of the method takes advantage of results connecting performance to bounds for the KL-divergence between demonstrated and learned policies, and its objective is rigorously justified through a connection to a probabilistic inference framework for reinforcement learning, incorporating the reinforcement learning objective and the objective to abide by constraints in an entropy maximization setting. The proposed algorithm optimizes the learning objective with dual gradient descent, supporting effective and stable training. Experiments show that the proposed method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, accommodating different modalities of demonstrated behaviour, and with abilities to generalize.