Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

📅 2024-09-24

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

171K/year

🤖 AI Summary

To address low constraint inference accuracy, poor sampling efficiency, and lack of theoretical guarantees in Inverse Constrained Reinforcement Learning (ICRL), this paper first defines the feasible cost set for ICRL and establishes a theoretical linkage between estimation error and constraint feasibility. We propose an actively exploring framework with provably falsifiable sample complexity, featuring two algorithms endowed with rigorous convergence guarantees: dynamic compression of cumulative cost errors and constrained policy search within near-optimal regions. Our theoretical analysis derives an $O(1/varepsilon^2)$ upper bound on sample complexity—substantially improving upon existing heuristic approaches. Extensive experiments across diverse simulated environments demonstrate simultaneous improvements in both constraint recovery accuracy and data efficiency.

Technology Category

Application Category

📝 Abstract

Optimizing objective functions subject to constraints is fundamental in many real-world applications. However, these constraints are often not readily defined and must be inferred from expert agent behaviors, a problem known as Inverse Constraint Inference. Inverse Constrained Reinforcement Learning (ICRL) is a common solver for recovering feasible constraints in complex environments, relying on training samples collected from interactive environments. However, the efficacy and efficiency of current sampling strategies remain unclear. We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap. By defining the feasible cost set for ICRL problems, we analyze how estimation errors in transition dynamics and the expert policy influence the feasibility of inferred constraints. Based on this analysis, we introduce two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimations or 2) strategically constraining the exploration policy around plausibly optimal ones. Both algorithms are theoretically grounded with tractable sample complexity, and their performance is validated empirically across various environments.

Problem

Research questions and friction points this paper is trying to address.

Inferring constraints from expert behaviors in complex environments

Ensuring efficient sampling for inverse constrained reinforcement learning

Reducing estimation errors in transition dynamics and expert policy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Strategic exploration framework for efficient sampling

Two exploratory algorithms for constraint inference

Theoretical sample complexity with empirical validation

🔎 Similar Papers

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning