Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses safety-constrained imitation learning, aiming to learn a maximum-entropy policy from expert demonstrations while satisfying multiple constraints (e.g., safety requirements and task specifications). We propose a novel probabilistic inference–based imitation learning framework that formulates constraint satisfaction as a KL-divergence–constrained optimization problem, yielding a unified objective accommodating both hard and soft constraints. To jointly optimize policy entropy and constraint violation cost, we introduce a dual gradient descent algorithm with theoretical convergence guarantees. Theoretically, we establish a rigorous equivalence between constrained imitation learning and constrained maximum-entropy reinforcement learning. Empirically, our method achieves significant improvements over state-of-the-art baselines across diverse simulated and real-world robotic tasks, demonstrating superior generalization, behavioral diversity, and robustness to constraint violations.

Technology Category

Application Category

📝 Abstract

This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert trajectories executing a task. The formulation of the method takes advantage of results connecting performance to bounds for the KL-divergence between demonstrated and learned policies, and its objective is rigorously justified through a connection to a probabilistic inference framework for reinforcement learning, incorporating the reinforcement learning objective and the objective to abide by constraints in an entropy maximization setting. The proposed algorithm optimizes the learning objective with dual gradient descent, supporting effective and stable training. Experiments show that the proposed method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, accommodating different modalities of demonstrated behaviour, and with abilities to generalize.

Problem

Research questions and friction points this paper is trying to address.

Learning safe policies via imitation with expert constraints

Connecting imitation learning to probabilistic inference framework

Optimizing constraints-abiding policies using dual gradient descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Imitation learning for maximum entropy policies

Dual gradient descent optimization algorithm

Probabilistic inference framework integration

🔎 Similar Papers

No similar papers found.