UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
When deploying vision-based motor policies across robot morphologies, dynamics mismatch leads to out-of-distribution behavior and execution failure. To address this, we propose Embodiment-Aware Diffusion Policies (EADP). EADP trains a general-purpose visuomotor policy on human demonstrations collected with a handheld gripper, then—during inference—incorporates morphology-specific controller gradients directly into the diffusion sampling process, enabling plug-and-play trajectory adaptation without fine-tuning. Crucially, EADP is the first method to explicitly integrate low-level control gradients into diffusion-based policy generation, facilitating real-time, dynamically feasible trajectory guidance even on resource-constrained platforms such as aerial manipulators. Experiments demonstrate significant improvements in success rate and robustness across multi-task, long-horizon aerial manipulation tasks. EADP exhibits strong cross-morphology and cross-environment generalization, along with zero-shot transfer capability.

Technology Category

Application Category

📝 Abstract
We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, and checkpoints will be publicly released after acceptance. Result videos can be found at umi-on-air.github.io.
Problem

Research questions and friction points this paper is trying to address.

Transferring visuomotor policies to constrained robotic embodiments effectively
Addressing control mismatch between training and deployment robot dynamics
Enabling embodiment-aware trajectory adaptation for diverse manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embodiment-Aware Diffusion Policy for dynamic adaptation
Integrates gradient feedback from controller tracking cost
Enables plug-and-play trajectory adaptation for diverse embodiments