MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing semantic segmentation methods in providing motion-level interaction cues suitable for embodied reasoning and robotic manipulation, particularly their inability to effectively identify manipulable rigid-body motion units in real-world scenes. To this end, the authors propose MotionBit—a minimal unit of rigid-body motion grounded in the kinematic equivalence of spatial twists—and introduce MoRiBo, the first human-annotated benchmark dataset dedicated to rigid-body motion segmentation. Leveraging a learning-free graph-based segmentation approach combined with twist-equivalence analysis, the method achieves precise modeling and segmentation of rigid-body motion regions. Evaluated on MoRiBo, it attains a macro-averaged mIoU of 37.3%, substantially outperforming state-of-the-art embodied perception methods and significantly enhancing downstream embodied reasoning and manipulation tasks.

Technology Category

Application Category

📝 Abstract
Rigid bodies constitute the smallest manipulable elements in the real world, and understanding how they physically interact is fundamental to embodied reasoning and robotic manipulation. Thus, accurate detection, segmentation, and tracking of moving rigid bodies is essential for enabling reasoning modules to interpret and act in diverse environments. However, current segmentation models trained on semantic grouping are limited in their ability to provide meaningful interaction-level cues for completing embodied tasks. To address this gap, we introduce MotionBit, a novel concept that, unlike prior formulations, defines the smallest unit in motion-based segmentation through kinematic spatial twist equivalence, independent of semantics. In this paper, we contribute (1) the MotionBit concept and definition, (2) a hand-labeled benchmark, called MoRiBo, for evaluating moving rigid-body segmentation across robotic manipulation and human-in-the-wild videos, and (3) a learning-free graph-based MotionBits segmentation method that outperforms state-of-the-art embodied perception methods by 37.3\% in macro-averaged mIoU on the MoRiBo benchmark. Finally, we demonstrate the effectiveness of MotionBits segmentation for downstream embodied reasoning and manipulation tasks, highlighting its importance as a fundamental primitive for understanding physical interactions.
Problem

Research questions and friction points this paper is trying to address.

rigid body segmentation
motion-based segmentation
embodied reasoning
physical interaction
video segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

MotionBit
rigid body segmentation
spatial twist equivalence
embodied perception
kinematic-based segmentation
🔎 Similar Papers
No similar papers found.