🤖 AI Summary
To address the need for interpretable understanding of functional regions (e.g., graspable or pressable areas) in robotic autonomous manipulation and human–robot interaction, this paper introduces the first 3D point-cloud-based functional region detection method. It pioneers the integration of probabilistic prototype learning into this task. Built upon a PointNet++ backbone, the approach jointly learns functional region localization and human-interpretable explanations via probabilistic prototype matching, soft attention mechanisms, and local geometric encoding. Unlike black-box models, it achieves state-of-the-art accuracy on 3D-AffordanceNet (improving mAP by 1.2%), while simultaneously generating faithful, semantically grounded explanations: each predicted region is explicitly linked to a human-understandable training prototype (e.g., “similar to a canonical grasping prototype”). This work establishes a novel, trustworthy paradigm for explainable 3D functional reasoning.
📝 Abstract
Robotic agents need to understand how to interact with objects in their environment, both autonomously and during human-robot interactions. Affordance detection on 3D point clouds, which identifies object regions that allow specific interactions, has traditionally relied on deep learning models like PointNet++, DGCNN, or PointTransformerV3. However, these models operate as black boxes, offering no insight into their decision-making processes. Prototypical Learning methods, such as ProtoPNet, provide an interpretable alternative to black-box models by employing a"this looks like that"case-based reasoning approach. However, they have been primarily applied to image-based tasks. In this work, we apply prototypical learning to models for affordance detection on 3D point clouds. Experiments on the 3D-AffordanceNet benchmark dataset show that prototypical models achieve competitive performance with state-of-the-art black-box models and offer inherent interpretability. This makes prototypical models a promising candidate for human-robot interaction scenarios that require increased trust and safety.