LVLMs and Humans Ground Differently in Referential Communication

📅 2026-01-27
📈 Citations: 1
Influential: 1
📄 PDF

career value

214K/year
🤖 AI Summary
This study addresses the limitations of large vision-language models (LVLMs) in effectively modeling shared grounding during collaborative tasks, which constrains their ability to understand and generate referring expressions. Through a director-matcher paradigm, the work systematically compares multi-turn referential communication performance across four pairing types—human-human, human-AI, AI-human, and AI-AI—in a label-free image matching task. Analyzing 356 dialogues from 89 participant pairs using a factorial design, an interactive online platform, and specialized tools for referring expression analysis, the study reveals that LVLMs are significantly weaker than humans in dynamically establishing common ground, highlighting deficiencies in their language grounding capabilities. The project publicly releases the full experimental pipeline and dialogue corpus, providing a benchmark resource for future research.

Technology Category

Application Category

📝 Abstract
For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. Here, we present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We release the online pipeline for data collection, the tools and analyses for accuracy, efficiency, and lexical overlap, and a corpus of 356 dialogues (89 pairs over 4 rounds each) that unmasks LVLMs'limitations in interactively resolving referring expressions, a crucial skill that underlies human language use.
Problem

Research questions and friction points this paper is trying to address.

referential communication
common ground
large vision-language models
human-AI collaboration
referring expressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

referential communication
large vision-language models
common ground
interactive dialogue
lexical overlap
🔎 Similar Papers