🤖 AI Summary
General-purpose robotic policies typically require large-scale expert demonstrations or extensive simulation training; however, existing approaches suffer from low data efficiency and struggle to achieve high success rates in multi-task generalization from limited demonstrations.
Method: We propose a performance-aware multi-task policy distillation framework that unifies task-specific expert policies into a single generalist policy. A Kalman-filter-based learning gain estimator dynamically allocates scarce expert demonstrations to maximize data efficiency. The framework integrates DAgger, behavioral cloning, and multi-task learning to enable cross-task knowledge transfer.
Contribution/Results: Evaluated on MetaWorld and IsaacLab drawer-opening tasks, our method achieves significantly higher zero-shot transfer success rates on real robots compared to baselines, while reducing required expert demonstrations by 40–60%. It demonstrates superior data efficiency and scalable generalization across diverse manipulation tasks.
📝 Abstract
Generalist robot policies that can perform many tasks typically require extensive expert data or simulations for training. In this work, we propose a novel Data-Efficient multitask DAgger framework that distills a single multitask policy from multiple task-specific expert policies. Our approach significantly increases the overall task success rate by actively focusing on tasks where the multitask policy underperforms. The core of our method is a performance-aware scheduling strategy that tracks how much each task's learning process benefits from the amount of data, using a Kalman filter-based estimator to robustly decide how to allocate additional demonstrations across tasks. We validate our approach on MetaWorld, as well as a suite of diverse drawer-opening tasks in IsaacLab. The resulting policy attains high performance across all tasks while using substantially fewer expert demonstrations, and the visual policy learned with our method in simulation shows better performance than naive DAgger and Behavior Cloning when transferring zero-shot to a real robot without using real data.