🤖 AI Summary
Diffusion-based robotic policy learning suffers from low data efficiency and heavy reliance on large-scale action-labeled demonstrations. To address this, we propose a “planning–control” decoupled framework: (i) learning a dynamics-consistent latent representation from unlabeled observation-only demonstrations; (ii) modeling state evolution via a Deep Koopman operator; and (iii) employing a diffusion model as a latent-space planner, with action generation realized through a lightweight linear decoder. This work is the first to integrate Koopman dynamics with diffusion-based planning, substantially reducing dependence on action annotations. Evaluated on both simulated and real-world robotic manipulation tasks, our method achieves comparable performance using only ~1/3 of the action-labeled data required by baseline approaches—yielding over 3× improvement in action-data efficiency and enabling robust multimodal imitation learning.
📝 Abstract
Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action labels, creating a substantial data collection burden. In this work, we propose a plan-then-control framework aimed at improving the action-data efficiency of inverse dynamics controllers by leveraging observational demonstration data. Specifically, we adopt a Deep Koopman Operator framework to model the dynamical system and utilize observation-only trajectories to learn a latent action representation. This latent representation can then be effectively mapped to real high-dimensional continuous actions using a linear action decoder, requiring minimal action-labeled data. Through experiments on simulated robot manipulation tasks and a real robot experiment with multi-modal expert demonstrations, we demonstrate that our approach significantly enhances action-data efficiency and achieves high task success rates with limited action data.