GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In household settings, articulated object manipulation suffers from depth perception failure caused by transparent or reflective materials and exhibits poor generalization across part-level interactions. To address these challenges, we introduce the first large-scale, material-agnostic articulated object manipulation dataset with fine-grained, part-level annotations—including photorealistic material randomization and scene-level executable interaction pose labels. We propose a part-centric, material-invariant data paradigm and a modular neural framework that jointly integrates physics-based rendering synthesis, part-level semantic and kinematic modeling, depth estimation, and interaction pose optimization. Experiments demonstrate that our method improves depth estimation accuracy by 18.3% and executable pose prediction accuracy by 22.7% over state-of-the-art methods in both simulation and real-world scenarios. It further exhibits strong cross-material and cross-form generalization, as well as robustness to challenging optical properties.

Technology Category

Application Category

📝 Abstract
Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomization and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios. More information and demos can be found at: https://pku-epic.github.io/GAPartManip/.
Problem

Research questions and friction points this paper is trying to address.

Challenges in manipulating articulated objects due to imperfect depth perception
Lack of diverse part-based interactions for flexible manipulation
Need for material-agnostic dataset to improve depth and pose prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale part-centric dataset for manipulation
Photo-realistic material randomization technique
Modular framework for robust object manipulation
🔎 Similar Papers
No similar papers found.
W
Wenbo Cui
Institute of Automation, Chinese Academy of Sciences; Beijing Academy of Artificial Intelligence
Chengyang Zhao
Chengyang Zhao
Carnegie Mellon University
RoboticsMachine Learning3D Computer Vision
Songlin Wei
Songlin Wei
University of Southern California, (Previously) Peking University
Robotics3D Vision
Jiazhao Zhang
Jiazhao Zhang
Peking University
Embodied AINavigation3D Vision
Haoran Geng
Haoran Geng
PhD Student, UC Berkeley
RoboticsComputer VisionReinforcement Learning
Y
Yaran Chen
Institute of Automation, Chinese Academy of Sciences
H
He Wang
CFCS, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence; Galbot