🤖 AI Summary
To address the limited generalization of affordance grounding models to unseen objects and novel affordances in embodied intelligence, this paper introduces the first SAM-based extension framework for functional region identification. Methodologically, it designs an affordance-adaptive module and establishes a coarse-to-fine multi-stage supervised training paradigm, enabling end-to-end generation of affordance heatmaps and facilitating the transfer of foundational segmentation models into functional semantic space. Crucially, it aligns vision foundation models with functional semantics, supporting zero-shot affordance localization. Evaluated on the AGD20K benchmark, the approach significantly outperforms state-of-the-art methods and demonstrates strong generalization to both unseen objects and novel affordances. This work provides a scalable, function-aware perception foundation for open-world interaction in embodied AI systems.
📝 Abstract
Improving the generalization ability of an affordance grounding model to recognize regions for unseen objects and affordance functions is crucial for real-world application. However, current models are still far away from such standards. To address this problem, we introduce AffordanceSAM, an effective approach that extends SAM's generalization capacity to the domain of affordance grounding. For the purpose of thoroughly transferring SAM's robust performance in segmentation to affordance, we initially propose an affordance-adaption module in order to help modify SAM's segmentation output to be adapted to the specific functional regions required for affordance grounding. We concurrently make a coarse-to-fine training recipe to make SAM first be aware of affordance objects and actions coarsely, and then be able to generate affordance heatmaps finely. Both quantitative and qualitative experiments show the strong generalization capacity of our AffordanceSAM, which not only surpasses previous methods under AGD20K benchmark but also shows evidence to handle the task with novel objects and affordance functions.