🤖 AI Summary
This work addresses the challenge of insufficient understanding of object manipulable regions by robots under complex linguistic instructions. We propose a reasoning-driven manipulability segmentation framework that, for the first time, integrates a Chain-of-Thought cold-start strategy with reinforcement learning. The method generates grasp candidates from global point clouds and performs context-aware filtering using language-conditioned manipulability masks. By innovatively unifying spatial reasoning, semantic alignment, and grasp decision-making, our approach outperforms state-of-the-art methods across multiple benchmark datasets and demonstrates exceptional robustness and generalization in real-world robotic experiments.
📝 Abstract
We introduce AffordanceGrasp-R1, a reasoning-driven affordance segmentation framework for robotic grasping that combines a chain-of-thought (CoT) cold-start strategy with reinforcement learning to enhance deduction and spatial grounding. In addition, we redesign the grasping pipeline to be more context-aware by generating grasp candidates from the global scene point cloud and subsequently filtering them using instruction-conditioned affordance masks. Extensive experiments demonstrate that AffordanceGrasp-R1 consistently outperforms state-of-the-art (SOTA) methods on benchmark datasets, and real-world robotic grasping evaluations further validate its robustness and generalization under complex language-conditioned manipulation scenarios.