Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work proposes Dual Q-DM, a novel non-adversarial imitation learning approach that addresses the compounding error and poor generalization of existing methods like IQ-Learn beyond expert-covered states. By introducing Bellman consistency constraints within a non-adversarial framework—combined with dual distribution matching and Q-function optimization—the method propagates high Q-values to unvisited states, enabling effective generalization. Theoretically, Dual Q-DM eliminates compounding errors entirely and recovers the expert policy, achieving performance equivalent to adversarial approaches. Empirical results across multiple tasks demonstrate its significant superiority over current non-adversarial imitation learning algorithms, offering both stability and sample efficiency.

Technology Category

Application Category

📝 Abstract

Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to BC and suffers from an imitation gap lower bound with quadratic dependence on horizon, therefore still suffering from compounding errors. Theoretical analysis reveals that, despite using online interactions, IQ-Learn uniformly suppresses the Q-values for all actions on states uncovered by demonstrations, thereby failing to generalize. To address this limitation, we introduce a primal-dual framework for distribution matching, yielding a new Q-based IL method, Dual Q-DM. The key mechanism in Dual Q-DM is incorporating Bellman constraints to propagate high Q-values from visited states to unvisited ones, thereby achieving generalization beyond demonstrations. We prove that Dual Q-DM is equivalent to AIL and can recover expert actions beyond demonstrations, thereby mitigating compounding errors. To the best of our knowledge, Dual Q-DM is the first non-adversarial IL method that is theoretically guaranteed to eliminate compounding errors. Experimental results further corroborate our theoretical results.

Problem

Research questions and friction points this paper is trying to address.

compounding errors

imitation learning

non-adversarial

generalization

Bellman constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

non-adversarial imitation learning

compounding errors

Bellman constraints