Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing models lack chain-of-thought (CoT) reasoning capabilities, hindering their ability to model shared affordances across objects and limiting cross-domain generalization and explicit reasoning. To address this, we propose the first unified affordance localization framework that integrates cognitive-chain reasoning with Group Relative Policy Optimization (GRPO)—a novel reinforcement learning algorithm—enabling zero-shot generalization and emergent test-time reasoning without explicit reasoning annotations. Our approach leverages multimodal large language models and introduces a multi-dimensional reward function jointly optimizing format compliance, perceptual grounding, and cognitive reasoning. We further construct ReasonAff, a new benchmark dataset for affordance reasoning. Experiments demonstrate substantial improvements over state-of-the-art methods across multiple benchmarks, with robust zero-shot transfer and open-world generalization. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Affordance grounding focuses on predicting the specific regions of objects that are associated with the actions to be performed by robots. It plays a vital role in the fields of human-robot interaction, human-object interaction, embodied manipulation, and embodied perception. Existing models often neglect the affordance shared among different objects because they lack the Chain-of-Thought(CoT) reasoning abilities, limiting their out-of-domain (OOD) generalization and explicit reasoning capabilities. To address these challenges, we propose Affordance-R1, the first unified affordance grounding framework that integrates cognitive CoT guided Group Relative Policy Optimization (GRPO) within a reinforcement learning paradigm. Specifically, we designed a sophisticated affordance function, which contains format, perception, and cognition rewards to effectively guide optimization directions. Furthermore, we constructed a high-quality affordance-centric reasoning dataset, ReasonAff, to support training. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Affordance-R1 achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Comprehensive experiments demonstrate that our model outperforms well-established methods and exhibits open-world generalization. To the best of our knowledge, Affordance-R1 is the first to integrate GRPO-based RL with reasoning into affordance reasoning. The code of our method and our dataset is released on https://github.com/hq-King/Affordance-R1.

Problem

Research questions and friction points this paper is trying to address.

Enhancing affordance grounding for robot action regions

Improving generalization via Chain-of-Thought reasoning

Integrating reinforcement learning with cognitive rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates GRPO-based RL for affordance reasoning

Uses format, perception, cognition rewards

Constructs ReasonAff dataset for training

🔎 Similar Papers

Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text