PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to perceiving the motion and structure of articulated objects rely heavily on high-quality 3D data and manual annotations, limiting their scalability. This work proposes a self-supervised framework that, for the first time, learns articulated object dynamics directly from large-scale in-the-wild egocentric videos of natural hand-object interactions, without requiring fine-grained 3D annotations or synthetic data. By integrating egocentric video analysis, hand-object interaction modeling, and self-supervised learning, the method significantly outperforms existing baselines on the HD-EPIC and Arti4D datasets. Furthermore, it effectively enhances the fine-tuning of 3D articulated prediction models and improves performance in robotic manipulation tasks.

Technology Category

Application Category

📝 Abstract
Articulation perception aims to recover the motion and structure of articulated objects (e.g., drawers and cupboards), and is fundamental to 3D scene understanding in robotics, simulation, and animation. Existing learning-based methods rely heavily on supervised training with high-quality 3D data and manual annotations, limiting scalability and diversity. To address this limitation, we propose PAWS, a method that directly extracts object articulations from hand-object interactions in large-scale in-the-wild egocentric videos. We evaluate our method on the public data sets, including HD-EPIC and Arti4D data sets, achieving significant improvements over baselines. We further demonstrate that the extracted articulations benefit downstream tasks, including fine-tuning 3D articulation prediction models and enabling robot manipulation. See the project website at https://aaltoml.github.io/PAWS/.
Problem

Research questions and friction points this paper is trying to address.

articulation perception
egocentric videos
3D scene understanding
scalability
diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

articulation perception
egocentric videos
unsupervised learning
hand-object interaction
3D scene understanding
🔎 Similar Papers
No similar papers found.