đ¤ AI Summary
Diffusion-based policies exhibit limited generalization in robotic manipulationâstruggling to transfer across unseen robot arms or novel tasks without costly retraining and data collection. This work proposes a zero-shot, inference-time adaptation method that enables immediate cross-hardware and dynamic-task deployment without retraining. Our approach jointly optimizes differentiable SE(3) trajectory generation and projects the outputs onto kinematic and task-specific constraints via a differentiable projection layer. Crucially, we are the first to embed physics-consistent modeling directly into the diffusion policyâs inference process, using differentiable projection to bridge visionâmotor representations with real-world actuator constraints. We validate the method on multiple physical robotic platformsâincluding diverse manipulators and end-effectorsâdemonstrating high success rates and robustness across grasping, pushing, and pouring tasks. Results show substantial improvement in cross-platform deployability of diffusion policies, enabling practical real-world adaptation with no additional training.
đ Abstract
Diffusion policies are powerful visuomotor models for robotic manipulation, yet they often fail to generalize to manipulators or end-effectors unseen during training and struggle to accommodate new task requirements at inference time. Addressing this typically requires costly data recollection and policy retraining for each new hardware or task configuration. To overcome this, we introduce an adaptation-projection strategy that enables a diffusion policy to perform zero-shot adaptation to novel manipulators and dynamic task settings, entirely at inference time and without any retraining. Our method first trains a diffusion policy in SE(3) space using demonstrations from a base manipulator. During online deployment, it projects the policy's generated trajectories to satisfy the kinematic and task-specific constraints imposed by the new hardware and objectives. Moreover, this projection dynamically adapts to physical differences (e.g., tool-center-point offsets, jaw widths) and task requirements (e.g., obstacle heights), ensuring robust and successful execution. We validate our approach on real-world pick-and-place, pushing, and pouring tasks across multiple manipulators, including the Franka Panda and Kuka iiwa 14, equipped with a diverse array of end-effectors like flexible grippers, Robotiq 2F/3F grippers, and various 3D-printed designs. Our results demonstrate consistently high success rates in these cross-manipulator scenarios, proving the effectiveness and practicality of our adaptation-projection strategy. The code will be released after peer review.