Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Imitation learning suffers from low sample efficiency and difficulty in surpassing expert performance. To address these challenges, we propose a dual-exploration framework: (1) an optimism-driven objective constructed from policy uncertainty to accelerate convergence toward expert behavior; and (2) a curiosity-based exploration reward that actively visits state regions unobserved in expert demonstrations, enabling performance breakthroughs. Our method integrates uncertainty-regularized policy optimization within a reinforcement learning framework. Evaluated on Atari and MuJoCo benchmarks, it achieves superior performance using only a small number of expert demonstrations—significantly outperforming existing state-of-the-art methods and attaining super-expert-level results. Theoretical analysis establishes a sublinear regret bound with respect to the number of episodes, providing a principled foundation for efficient imitation learning. This work introduces a novel paradigm that jointly leverages epistemic uncertainty and intrinsic motivation to bridge the gap between imitation and autonomous improvement.

Technology Category

Application Category

📝 Abstract
Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.
Problem

Research questions and friction points this paper is trying to address.

Mimic expert policy with limited demonstrations efficiently
Explore environment to achieve beyond-expert performance
Overcome state space complexity in imitation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Double exploration for efficient imitation learning
Optimistic policy optimization with uncertainty rewards
Curiosity-driven exploration for beyond-expert performance
🔎 Similar Papers
No similar papers found.
Heyang Zhao
Heyang Zhao
UCLA
Machine Learning
Xingrui Yu
Xingrui Yu
Scientist, CFAR, A*STAR
Machine LearningRobust Imitation LearningTrustworthy AI
D
David M. Bossens
IHPC, Agency for Science, Technology and Research, Singapore; CFAR, Agency for Science, Technology and Research, Singapore
I
Ivor W. Tsang
IHPC, Agency for Science, Technology and Research, Singapore; CFAR, Agency for Science, Technology and Research, Singapore
Quanquan Gu
Quanquan Gu
Associate Professor of Computer Science, UCLA
AGILarge Language ModelsReinforcement LearningNonconvex Optimization