🤖 AI Summary
Existing point-level affordance methods are limited to single-object, homogeneous manipulator settings and fail to model real-world challenges—including occlusion, geometric constraints, and robot embodiment—leading to poor generalization in complex scenes. This work introduces the first environment-aware affordance framework for domestic assistive robots. Our approach jointly models the environment and objects by integrating 3D articulation structure understanding, occlusion-aware geometry, and robot kinematic constraints. We propose a contrastive affordance learning mechanism that enables efficient generalization from training with single occluders to deployment in multi-occluder configurations. Implemented via point-cloud representation and contrastive learning, our method significantly improves accuracy in locating movable parts and predicting manipulation intent under occlusion, both on synthetic and real-world datasets. Experiments demonstrate strong generalization across varying occlusion complexity, validating the framework’s robustness for practical robotic interaction.
📝 Abstract
Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agent's morphology, e.g., occlusions and physical limitations. In this paper, we propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. Unlike object-centric affordance approaches, learning environment-aware affordance faces the challenge of combinatorial explosion due to the complexity of various occlusions, characterized by their quantities, geometries, positions and poses. To address this and enhance data efficiency, we introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations. Experiments demonstrate the effectiveness of our proposed approach in learning affordance considering environment constraints. Project page at https://chengkaiacademycity.github.io/EnvAwareAfford/