WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

In imitation learning, policy error accumulation often causes robots to deviate from the training distribution, leading to task failure, while conventional DAgger approaches rely on continuous human intervention, limiting their scalability. This work proposes WM-DAgger, a framework that leverages a world model to autonomously generate corrective data for out-of-distribution states without human involvement. The method incorporates a task-oriented corrective action synthesis module and a consistency-guided filtering mechanism to eliminate physically implausible trajectories, effectively mitigating hallucination issues inherent in world models. Evaluated on few-shot manipulation tasks with an eye-in-hand robotic arm, WM-DAgger achieves a 93.3% success rate on a soft pouch pushing task using only five expert demonstrations, substantially outperforming baseline methods.

Technology Category

Application Category

📝 Abstract

Imitation learning is a powerful paradigm for training robotic policies, yet its performance is limited by compounding errors: minor policy inaccuracies could drive robots into unseen out-of-distribution (OOD) states in the training set, where the policy could generate even bigger errors, leading to eventual failures. While the Data Aggregation (DAgger) framework tries to address this issue, its reliance on continuous human involvement severely limits scalability. In this paper, we propose WM-DAgger, an efficient data aggregation framework that leverages World Models to synthesize OOD recovery data without requiring human involvement. Specifically, we focus on manipulation tasks with an eye-in-hand robotic arm and only few-shot demonstrations. To avoid synthesizing misleading data and overcome the hallucination issues inherent to World Models, our framework introduces two key mechanisms: (1) a Corrective Action Synthesis Module that generates task-oriented recovery actions to prevent misleading supervision, and (2) a Consistency-Guided Filtering Module that discards physically implausible trajectories by anchoring terminal synthesized frames to corresponding real frames in expert demonstrations. We extensively validate WM-DAgger on multiple real-world robotic tasks. Results that our method significantly improves success rates, achieving a 93.3\% success rate in soft bag pushing with only five demonstrations. The source code is publicly available at https://github.com/czs12354-xxdbd/WM-Dagger.

Problem

Research questions and friction points this paper is trying to address.

Imitation Learning

Compounding Errors

Out-of-Distribution

Data Aggregation

World Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

World Models

Imitation Learning

Data Aggregation

Out-of-Distribution Recovery

Robot Manipulation

🔎 Similar Papers

Overcoming Knowledge Barriers: Online Imitation Learning from Visual Observation with Pretrained World Models

2024-04-29Citations: 0