Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Dual Q-DM, a novel non-adversarial imitation learning approach that addresses the compounding error and poor generalization of existing methods like IQ-Learn beyond expert-covered states. By introducing Bellman consistency constraints within a non-adversarial framework—combined with dual distribution matching and Q-function optimization—the method propagates high Q-values to unvisited states, enabling effective generalization. Theoretically, Dual Q-DM eliminates compounding errors entirely and recovers the expert policy, achieving performance equivalent to adversarial approaches. Empirical results across multiple tasks demonstrate its significant superiority over current non-adversarial imitation learning algorithms, offering both stability and sample efficiency.

Technology Category

Application Category

📝 Abstract
Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to BC and suffers from an imitation gap lower bound with quadratic dependence on horizon, therefore still suffering from compounding errors. Theoretical analysis reveals that, despite using online interactions, IQ-Learn uniformly suppresses the Q-values for all actions on states uncovered by demonstrations, thereby failing to generalize. To address this limitation, we introduce a primal-dual framework for distribution matching, yielding a new Q-based IL method, Dual Q-DM. The key mechanism in Dual Q-DM is incorporating Bellman constraints to propagate high Q-values from visited states to unvisited ones, thereby achieving generalization beyond demonstrations. We prove that Dual Q-DM is equivalent to AIL and can recover expert actions beyond demonstrations, thereby mitigating compounding errors. To the best of our knowledge, Dual Q-DM is the first non-adversarial IL method that is theoretically guaranteed to eliminate compounding errors. Experimental results further corroborate our theoretical results.
Problem

Research questions and friction points this paper is trying to address.

compounding errors
imitation learning
non-adversarial
generalization
Bellman constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

non-adversarial imitation learning
compounding errors
Bellman constraints
distribution matching
Q-learning
🔎 Similar Papers
No similar papers found.
Tian Xu
Tian Xu
Nanjing University
Reinforcement Learning
C
Chenyang Wang
National Key Laboratory for Novel Software Technology & School of Artificial Intelligence, Nanjing University, China
X
Xiaochen Zhai
National Key Laboratory for Novel Software Technology & School of Artificial Intelligence, Nanjing University, China
Ziniu Li
Ziniu Li
The Chinese University of Hong Kong, Shenzhen
Machine LearningReinforcement LearningLarge Language Models
Yi-Chen Li
Yi-Chen Li
Nanjing University
Reinforcement LearningImitation LearningRLHF
Yang Yu
Yang Yu
Professor, Nanjing University
Artificial IntelligenceReinforcement LearningEvolutionary Algorithms