Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text

📅 2024-02-20
🏛️ Conference on Computational Natural Language Learning
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the limitations of pretrained language models (PTLMs) and vision-language models (VLMs) in text-only affordance reasoning—particularly for unconventional or rare object functions. To address this, we propose the first purely text-driven, sentence-level affordance probing framework; construct Text2Afford, the first in-the-wild dataset for language grounding with 15 fine-grained functional annotations; and introduce a consistency verification mechanism. We conduct unified cross-architecture evaluation (LLMs/VLMs) via prompt engineering and few-shot fine-tuning. Experiments reveal that state-of-the-art PTLMs achieve <45% accuracy on unconventional affordances; VLMs show no significant gain from visual modality; and few-shot fine-tuning yields an average 22.6% improvement, confirming functional knowledge plasticity. This study is the first to systematically expose structural deficits in multimodal models’ implicit affordance knowledge and establishes a new benchmark and methodology for text-driven functional understanding.

Technology Category

Application Category

📝 Abstract
We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs).A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances – Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances.
Problem

Research questions and friction points this paper is trying to address.

Probing object affordance prediction abilities in language models
Evaluating reasoning limitations for uncommon object affordances
Assessing vision-language models' effectiveness in capturing affordances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probes object affordance knowledge in language models
Creates novel Text2Afford dataset with 15 classes
Improves models through few-shot fine-tuning approach
🔎 Similar Papers
No similar papers found.