Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses safety-constrained imitation learning, aiming to learn a maximum-entropy policy from expert demonstrations while satisfying multiple constraints (e.g., safety requirements and task specifications). We propose a novel probabilistic inference–based imitation learning framework that formulates constraint satisfaction as a KL-divergence–constrained optimization problem, yielding a unified objective accommodating both hard and soft constraints. To jointly optimize policy entropy and constraint violation cost, we introduce a dual gradient descent algorithm with theoretical convergence guarantees. Theoretically, we establish a rigorous equivalence between constrained imitation learning and constrained maximum-entropy reinforcement learning. Empirically, our method achieves significant improvements over state-of-the-art baselines across diverse simulated and real-world robotic tasks, demonstrating superior generalization, behavioral diversity, and robustness to constraint violations.

Technology Category

Application Category

📝 Abstract
This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert trajectories executing a task. The formulation of the method takes advantage of results connecting performance to bounds for the KL-divergence between demonstrated and learned policies, and its objective is rigorously justified through a connection to a probabilistic inference framework for reinforcement learning, incorporating the reinforcement learning objective and the objective to abide by constraints in an entropy maximization setting. The proposed algorithm optimizes the learning objective with dual gradient descent, supporting effective and stable training. Experiments show that the proposed method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, accommodating different modalities of demonstrated behaviour, and with abilities to generalize.
Problem

Research questions and friction points this paper is trying to address.

Learning safe policies via imitation with expert constraints
Connecting imitation learning to probabilistic inference framework
Optimizing constraints-abiding policies using dual gradient descent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Imitation learning for maximum entropy policies
Dual gradient descent optimization algorithm
Probabilistic inference framework integration
🔎 Similar Papers
No similar papers found.
George Papadopoulos
George Papadopoulos
PhD Candidate, University of Piraeus
Human-Agents CollaborationDeep Reinforcement LearningExplainability
G
George A. Vouros
Department of Digital Systems, University of Piraeus, Piraeus, Greece