NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses zero-shot cross-platform robotic manipulation: enabling heterogeneous robots (e.g., Franka arm and Spot quadruped) to execute novel tasks directly from natural language descriptions—without any task-specific demonstrations or platform-specific fine-tuning. Methodologically, we propose a “text-to-video → 3D manipulable flow” paradigm: text-conditioned video generation is decomposed into geometric and dynamic representations of object motion, which jointly inform grasp proposal, trajectory optimization, and particle-based rigid/deformable-body dynamics modeling—thereby decoupling high-level task understanding from low-level control. To our knowledge, this is the first approach achieving true zero-shot, demonstration-free, and platform-agnostic robotic manipulation across morphologically distinct robots. We validate its effectiveness on rigid bodies, articulated objects, and deformable materials, demonstrating robust generalization without task demonstrations or robot-specific adaptation.

Technology Category

Application Category

📝 Abstract

Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle-based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero-shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/.

Problem

Research questions and friction points this paper is trying to address.

Enables zero-shot robot manipulation without demonstrations

Transfers across robot platforms without fine-tuning

Handles rigid and deformable objects through video synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts task descriptions into actionable robot plans

Synthesizes videos and distills 3D object flow

Uses flow for rigid/deformable object manipulation planning

🔎 Similar Papers

Vision-based Manipulation from Single Human Video with Open-World Object Graphs