Articulated Object Estimation in the Wild

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing methods struggle to robustly estimate 3D motion of articulated objects under dynamic camera motion and partial observability. This paper introduces ArtiPoint, the first framework to jointly model articulated trajectories and rotation axes directly from unconstrained, egocentric RGB-D videos. To support this, we introduce Arti4D—the first in-the-wild dataset featuring real-world human-object interactions, fine-grained articulation constraint annotations, and ground-truth camera poses. ArtiPoint combines deep point tracking with factor graph optimization to enable point-level motion analysis and structured geometric reasoning. Evaluated on Arti4D, ArtiPoint significantly outperforms both classical and learning-based baselines, achieving state-of-the-art accuracy in joint trajectory estimation and rotation axis recovery. Both code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Understanding the 3D motion of articulated objects is essential in robotic scene understanding, mobile manipulation, and motion planning. Prior methods for articulation estimation have primarily focused on controlled settings, assuming either fixed camera viewpoints or direct observations of various object states, which tend to fail in more realistic unconstrained environments. In contrast, humans effortlessly infer articulation by watching others manipulate objects. Inspired by this, we introduce ArtiPoint, a novel estimation framework that can infer articulated object models under dynamic camera motion and partial observability. By combining deep point tracking with a factor graph optimization framework, ArtiPoint robustly estimates articulated part trajectories and articulation axes directly from raw RGB-D videos. To foster future research in this domain, we introduce Arti4D, the first ego-centric in-the-wild dataset that captures articulated object interactions at a scene level, accompanied by articulation labels and ground-truth camera poses. We benchmark ArtiPoint against a range of classical and learning-based baselines, demonstrating its superior performance on Arti4D. We make code and Arti4D publicly available at https://artipoint.cs.uni-freiburg.de.

Problem

Research questions and friction points this paper is trying to address.

Estimating 3D articulated object motion in unconstrained environments

Overcoming limitations of fixed viewpoints and partial observability

Inferring articulation models from raw RGB-D videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep point tracking with factor graph optimization

Estimates part trajectories from raw RGB-D videos

Works under dynamic camera motion and partial observability

🔎 Similar Papers

Survey on Modeling of Human-made Articulated Objects