🤖 AI Summary
This work addresses zero-shot cross-platform robotic manipulation: enabling heterogeneous robots (e.g., Franka arm and Spot quadruped) to execute novel tasks directly from natural language descriptions—without any task-specific demonstrations or platform-specific fine-tuning. Methodologically, we propose a “text-to-video → 3D manipulable flow” paradigm: text-conditioned video generation is decomposed into geometric and dynamic representations of object motion, which jointly inform grasp proposal, trajectory optimization, and particle-based rigid/deformable-body dynamics modeling—thereby decoupling high-level task understanding from low-level control. To our knowledge, this is the first approach achieving true zero-shot, demonstration-free, and platform-agnostic robotic manipulation across morphologically distinct robots. We validate its effectiveness on rigid bodies, articulated objects, and deformable materials, demonstrating robust generalization without task demonstrations or robot-specific adaptation.
📝 Abstract
Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle-based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero-shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/.