PAOLI: Pose-free Articulated Object Learning from Sparse-view Images

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This paper addresses the problem of self-supervised learning of geometric and kinematic representations for articulated objects from only four sparse, unposed images per object. Methodologically, it introduces a progressive disentanglement framework that separately models static and dynamic parts while decoupling camera motion from joint articulation; it jointly optimizes sparse-view 3D reconstruction, deformable field modeling, and self-supervised consistency constraints across views and poses. To the best of our knowledge, this is the first approach to learn articulated object representations without pose supervision or dense multi-view inputs. Evaluated on standard benchmarks and real-world scenes, the method produces high-fidelity, fine-grained geometry–kinematics joint representations. It significantly reduces data dependency and outperforms prior methods in both reconstruction accuracy and kinematic plausibility.

Technology Category

Application Category

📝 Abstract

We present a novel self-supervised framework for learning articulated object representations from sparse-view, unposed images. Unlike prior methods that require dense multi-view observations and ground-truth camera poses, our approach operates with as few as four views per articulation and no camera supervision. To address the inherent challenges, we first reconstruct each articulation independently using recent advances in sparse-view 3D reconstruction, then learn a deformation field that establishes dense correspondences across poses. A progressive disentanglement strategy further separates static from moving parts, enabling robust separation of camera and object motion. Finally, we jointly optimize geometry, appearance, and kinematics with a self-supervised loss that enforces cross-view and cross-pose consistency. Experiments on the standard benchmark and real-world examples demonstrate that our method produces accurate and detailed articulated object representations under significantly weaker input assumptions than existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Learn articulated object representations from sparse unposed images

Reconstruct each articulation independently using sparse views

Separate static and moving parts without camera supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse-view 3D reconstruction for independent articulations

Progressive disentanglement strategy separating static parts

Self-supervised loss enforcing cross-view consistency

🔎 Similar Papers

No similar papers found.