Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

📅 2024-10-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion-based robotic policy learning suffers from low data efficiency and heavy reliance on large-scale action-labeled demonstrations. To address this, we propose a “planning–control” decoupled framework: (i) learning a dynamics-consistent latent representation from unlabeled observation-only demonstrations; (ii) modeling state evolution via a Deep Koopman operator; and (iii) employing a diffusion model as a latent-space planner, with action generation realized through a lightweight linear decoder. This work is the first to integrate Koopman dynamics with diffusion-based planning, substantially reducing dependence on action annotations. Evaluated on both simulated and real-world robotic manipulation tasks, our method achieves comparable performance using only ~1/3 of the action-labeled data required by baseline approaches—yielding over 3× improvement in action-data efficiency and enabling robust multimodal imitation learning.

Technology Category

Application Category

📝 Abstract
Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action labels, creating a substantial data collection burden. In this work, we propose a plan-then-control framework aimed at improving the action-data efficiency of inverse dynamics controllers by leveraging observational demonstration data. Specifically, we adopt a Deep Koopman Operator framework to model the dynamical system and utilize observation-only trajectories to learn a latent action representation. This latent representation can then be effectively mapped to real high-dimensional continuous actions using a linear action decoder, requiring minimal action-labeled data. Through experiments on simulated robot manipulation tasks and a real robot experiment with multi-modal expert demonstrations, we demonstrate that our approach significantly enhances action-data efficiency and achieves high task success rates with limited action data.
Problem

Research questions and friction points this paper is trying to address.

Improving action-data efficiency in imitation learning
Learning latent action representation from observation-only data
Achieving high task success with limited action-labeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion planners for multi-modal imitation learning
Deep Koopman controllers for latent action representation
Linear action decoder for minimal labeled data
🔎 Similar Papers
No similar papers found.