Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning approaches heavily rely on high-quality expert demonstrations, limiting generalization to diverse object configurations and initial states in real-world scenarios, and failing to effectively leverage non-expert data (e.g., gameplay recordings, partial trajectories, or suboptimal rollouts). This work proposes a unified offline reinforcement learning and imitation learning framework that jointly designs behavior cloning and value function optimization. With minimal algorithmic modifications, the method enables stable training even from sparse, heterogeneous non-expert data. Crucially, it expands the support of the learned policy distribution, significantly improving robustness and recovery capability against incomplete or suboptimal demonstrations. Evaluated on real-robot manipulation tasks, the approach increases the range of applicable initial conditions by 42%, demonstrating its effectiveness and practicality in complex, realistic environments.

Technology Category

Application Category

📝 Abstract
Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, restricting adaptability to the diverse range of real-world object configurations and scenarios. In contrast, non-expert data -- such as play data, suboptimal demonstrations, partial task completions, or rollouts from suboptimal policies -- can offer broader coverage and lower collection costs. However, conventional imitation learning approaches fail to utilize this data effectively. To address these challenges, we posit that with right design decisions, offline reinforcement learning can be used as a tool to harness non-expert data to enhance the performance of imitation learning policies. We show that while standard offline RL approaches can be ineffective at actually leveraging non-expert data under the sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow for the utilization of this data, without significant additional assumptions. Our approach shows that broadening the support of the policy distribution can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing considerably enhanced recovery and generalization behavior. In manipulation tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when non-expert data is incorporated. Moreover, we show that these methods are able to leverage all collected data, including partial or suboptimal demonstrations, to bolster task-directed policy performance. This underscores the importance of algorithmic techniques for using non-expert data for robust policy learning in robotics.
Problem

Research questions and friction points this paper is trying to address.

Utilizing non-expert data to enhance imitation learning robustness
Addressing limitations of expert-dependent imitation learning approaches
Improving policy generalization through offline reinforcement learning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses offline reinforcement learning for imitation learning
Enhances policy robustness with non-expert data
Modifies algorithms to utilize partial demonstrations
🔎 Similar Papers