Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Imitation learning suffers from low sample efficiency and difficulty in surpassing expert performance. To address these challenges, we propose a dual-exploration framework: (1) an optimism-driven objective constructed from policy uncertainty to accelerate convergence toward expert behavior; and (2) a curiosity-based exploration reward that actively visits state regions unobserved in expert demonstrations, enabling performance breakthroughs. Our method integrates uncertainty-regularized policy optimization within a reinforcement learning framework. Evaluated on Atari and MuJoCo benchmarks, it achieves superior performance using only a small number of expert demonstrations—significantly outperforming existing state-of-the-art methods and attaining super-expert-level results. Theoretical analysis establishes a sublinear regret bound with respect to the number of episodes, providing a principled foundation for efficient imitation learning. This work introduces a novel paradigm that jointly leverages epistemic uncertainty and intrinsic motivation to bridge the gap between imitation and autonomous improvement.

Technology Category

Application Category

📝 Abstract

Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.

Problem

Research questions and friction points this paper is trying to address.

Mimic expert policy with limited demonstrations efficiently

Explore environment to achieve beyond-expert performance

Overcome state space complexity in imitation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Double exploration for efficient imitation learning

Optimistic policy optimization with uncertainty rewards

Curiosity-driven exploration for beyond-expert performance

🔎 Similar Papers

RILe: Reinforced Imitation Learning