EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing language-guided robotic manipulation methods rely heavily on action-annotated data and struggle with complex scenarios—such as deformable objects, severe occlusions, and non-object-centric motions—due to their predominant object-centric optical flow prediction paradigms. To address these limitations, we propose Embodied-Centric Flow (EC-Flow), a novel framework that models the robot’s embodied dynamics as the interaction core and integrates kinematic priors to enhance generalization to deformation, occlusion, and non-rigid motion. We further introduce a target-alignment module for language-vision co-optimization, and unify EC-Flow prediction, target image generation, motion consistency constraints, and URDF-driven kinematic transformation into an end-to-end video-to-action learning pipeline. Experiments in both simulation and real-world settings demonstrate substantial improvements: +62% in occlusion handling, +45% in deformable object manipulation, and +80% in non-object-centric motion tasks—achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Current language-guided robotic manipulation systems often require low-level action-labeled datasets for imitation learning. While object-centric flow prediction methods mitigate this issue, they remain limited to scenarios involving rigid objects with clear displacement and minimal occlusion. In this work, we present Embodiment-Centric Flow (EC-Flow), a framework that directly learns manipulation from action-unlabeled videos by predicting embodiment-centric flow. Our key insight is that incorporating the embodiment's inherent kinematics significantly enhances generalization to versatile manipulation scenarios, including deformable object handling, occlusions, and non-object-displacement tasks. To connect the EC-Flow with language instructions and object interactions, we further introduce a goal-alignment module by jointly optimizing movement consistency and goal-image prediction. Moreover, translating EC-Flow to executable robot actions only requires a standard robot URDF (Unified Robot Description Format) file to specify kinematic constraints across joints, which makes it easy to use in practice. We validate EC-Flow on both simulation (Meta-World) and real-world tasks, demonstrating its state-of-the-art performance in occluded object handling (62% improvement), deformable object manipulation (45% improvement), and non-object-displacement tasks (80% improvement) than prior state-of-the-art object-centric flow methods. For more information, see our project website at https://ec-flow1.github.io .
Problem

Research questions and friction points this paper is trying to address.

Enables robotic manipulation from action-unlabeled videos
Handles deformable objects, occlusions, non-displacement tasks
Connects flow prediction with language instructions and goals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts embodiment-centric flow from unlabeled videos
Uses goal-alignment module for language instruction connection
Requires only URDF file for executable robot actions
🔎 Similar Papers
No similar papers found.
Y
Yixiang Chen
New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Peiyan Li
Peiyan Li
Ludwig-Maximilians-Universität München
data mininggraph mining
Y
Yan Huang
New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences; FiveAges
J
Jiabing Yang
New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences; FiveAges
K
Kehan Chen
New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
L
Liang Wang
New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences