Your Learned Constraint is Secretly a Backward Reachable Tube

📅 2025-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the fundamental mechanism of Inverse Constraint Learning (ICL) in safety-aware imitation learning. We find that ICL does not reconstruct observed failure states but instead implicitly learns a backward reachable tube (BRT) encoding the principle that “failure is inevitable.” This insight is established for the first time via rigorous theoretical analysis and empirical validation. Crucially, we demonstrate that the inferred constraints exhibit strong dependence on the dynamics of the data-collection system—challenging the conventional assumption of ICL’s generalizability. Methodologically, we integrate safety control theory, counterfactual reasoning, and trajectory-driven constraint inference, formalizing the BRT through the Hamilton–Jacobi partial differential equation. Experiments across multiple continuous-control benchmarks confirm the fidelity of the BRT interpretation. Furthermore, constraint transfer across dynamical systems incurs up to 47% performance degradation, yielding the first theoretical criterion for ICL’s sample efficiency and cross-dynamics transferability.

Technology Category

Application Category

📝 Abstract
Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints.
Problem

Research questions and friction points this paper is trying to address.

Inverse Constraint Learning
Safety Demonstrations
Adaptability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Constraint Learning
Backward Reachable Tubes
Safe Policy Discovery