Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing 2D foundational features struggle to model functional parts of articulated objects, suffering from multi-view inconsistency, insufficient geometric information, and low inference efficiency. This work proposes the Part-Aware Dense 3D Feature Field (PA3FF), which introduces part awareness into end-to-end feedforward 3D feature prediction for the first time. Trained via contrastive learning on large-scale 3D part-annotated data, PA3FF takes point clouds as input and directly generates a continuous feature field where feature distances reflect functional semantic proximity. Furthermore, the authors develop a Part-Aware Diffusion Policy (PADP) for robotic manipulation. Experiments demonstrate that PA3FF significantly outperforms CLIP, DINOv2, and Grounded-SAM in both simulation and real-world settings, substantially improving cross-category and cross-morphology generalization for manipulation while supporting downstream tasks such as correspondence learning and segmentation.

Technology Category

Application Category

📝 Abstract

Articulated object manipulation is essential for various real-world robotic tasks, yet generalizing across diverse objects remains a major challenge. A key to generalization lies in understanding functional parts (e.g., door handles and knobs), which indicate where and how to manipulate across diverse object categories and shapes. Previous works attempted to achieve generalization by introducing foundation features, while these features are mostly 2D-based and do not specifically consider functional parts. When lifting these 2D features to geometry-profound 3D space, challenges arise, such as long runtimes, multi-view inconsistencies, and low spatial resolution with insufficient geometric information. To address these issues, we propose Part-Aware 3D Feature Field (PA3FF), a novel dense 3D feature with part awareness for generalizable articulated object manipulation. PA3FF is trained by 3D part proposals from a large-scale labeled dataset, via a contrastive learning formulation. Given point clouds as input, PA3FF predicts a continuous 3D feature field in a feedforward manner, where the distance between point features reflects the proximity of functional parts: points with similar features are more likely to belong to the same part. Building on this feature, we introduce the Part-Aware Diffusion Policy (PADP), an imitation learning framework aimed at enhancing sample efficiency and generalization for robotic manipulation. We evaluate PADP on several simulated and real-world tasks, demonstrating that PA3FF consistently outperforms a range of 2D and 3D representations in manipulation scenarios, including CLIP, DINOv2, and Grounded-SAM. Beyond imitation learning, PA3FF enables diverse downstream methods, including correspondence learning and segmentation tasks, making it a versatile foundation for robotic manipulation. Project page: https://pa3ff.github.io

Problem

Research questions and friction points this paper is trying to address.

articulated object manipulation

generalization

functional parts

3D feature representation

robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Part-Aware 3D Feature Field

Articulated Object Manipulation

Contrastive Learning