Articulation in Prime: Primitive-Based Articulated Object Understanding from a Single Casual Video

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Recovering the 3D kinematic structure of articulated objects from casually captured monocular videos is highly challenging due to occlusions, severe camera motion, and weak texture. This work proposes a category-agnostic optimization framework that formulates the problem as geometric primitive fitting. By jointly optimizing primitive grouping, joint parameters, and part segmentation, the method explicitly incorporates rotational and translational joint constraints and introduces a visibility-aware mechanism to handle partial observations and occlusions. Requiring only a single input video, the approach significantly outperforms existing methods on both the newly introduced AiP-synth and AiP-real benchmarks, demonstrating particularly robust performance under strong occlusion and aggressive camera motion.

📝 Abstract

Retrieving the 3D kinematics of articulated objects from monocular video is a fundamental challenge in computer vision. Existing methods rely on complex video setups or cues such as long-term point tracking or wide-baseline matching, but are frequently brittle under severe occlusions, rapid camera ego-motion, or weak local features. Learning-based methods, meanwhile, struggle to generalize beyond their training categories. We propose a category-agnostic optimization framework that treats articulated object understanding as a primitive-fitting problem. Geometric primitives serve as a proxy representation that avoids the pitfalls of unstable point tracks; a novel mechanism organizes them into coherent parts constrained by revolute and prismatic joints. Our formulation jointly optimizes part segmentation and joint parameters, recovering complex kinematics from a single casually captured video. A visibility-aware procedure handles partial observations and occlusions inherent to real-world data. We also propose the AiP-synth and AiP-real benchmarks, featuring significant camera motion and heavy occlusions, and outperform existing methods. Project page: https://aartykov.github.io/Articulation-in-Prime/

Problem

Research questions and friction points this paper is trying to address.

articulated objects

3D kinematics

monocular video

occlusion

category-agnostic

Innovation

Methods, ideas, or system contributions that make the work stand out.

primitive-based representation

articulated object understanding

monocular video