Context informs pragmatic interpretation in vision-language models

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates the capacity of vision-language models (VLMs) to perform context-sensitive pragmatic reasoning in multi-turn dialogues, using iterative reference games as a canonical pragmatic task. Methodologically, we systematically manipulate contextual factors—including quantity, ordering, and relevance—and conduct few-shot evaluations against human baselines. Results show that relevant contextual information substantially improves VLM performance, enabling pragmatic inference accuracy approaching human-level competence; however, models remain significantly deficient when relevant context is absent or when resolving abstract references. This work provides the first empirical evidence that contextual relevance is a decisive factor in VLM pragmatic understanding, identifies dynamic context modeling as a critical bottleneck in current architectures, and establishes a rigorous evaluation framework to assess multimodal pragmatic reasoning. The findings offer concrete empirical grounding for developing more human-consistent, context-aware VLMs.

Technology Category

Application Category

📝 Abstract

Iterated reference games - in which players repeatedly pick out novel referents using language - present a test case for agents'ability to perform context-sensitive pragmatic reasoning in multi-turn linguistic environments. We tested humans and vision-language models on trials from iterated reference games, varying the given context in terms of amount, order, and relevance. Without relevant context, models were above chance but substantially worse than humans. However, with relevant context, model performance increased dramatically over trials. Few-shot reference games with abstract referents remain a difficult task for machine learning models.

Problem

Research questions and friction points this paper is trying to address.

Test context-sensitive pragmatic reasoning in vision-language models

Evaluate model performance on iterated reference games with humans

Address difficulty of few-shot abstract referent games for AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used iterated reference games for testing

Varied context amount order and relevance

Improved model performance with relevant context

🔎 Similar Papers

CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding