Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the cognitive mechanisms underlying foreign language learners’ inference of unfamiliar word meanings in multimodal picture–sentence contexts. Employing controlled experiments (target-word masking + illustrated sentences), quantitative multimodal feature analysis (image saliency, syntactic/semantic textual cues), and cross-linguistic participant comparisons—integrated with human behavioral analysis and AI reasoning model evaluation—we establish the first multimodal inference framework specifically designed for semantic ambiguity resolution. Results reveal that intuitive cues (e.g., image centrality) weakly predict inference accuracy; instead, verb semantic roles in text and image–object interaction exhibit stronger predictive power. Crucially, native language typology significantly modulates inference strategies. These findings provide empirical foundations for personalized vocabulary instruction and identify key bottlenecks—and corresponding optimization pathways—in AI models’ simulation of human multimodal semantic reasoning.

Technology Category

Application Category

📝 Abstract
We investigate a new setting for foreign language learning, where learners infer the meaning of unfamiliar words in a multimodal context of a sentence describing a paired image. We conduct studies with human participants using different image-text pairs. We analyze the features of the data (i.e., images and texts) that make it easier for participants to infer the meaning of a masked or unfamiliar word, and what language backgrounds of the participants correlate with success. We find only some intuitive features have strong correlations with participant performance, prompting the need for further investigating of predictive features for success in these tasks. We also analyze the ability of AI systems to reason about participant performance, and discover promising future directions for improving this reasoning ability.
Problem

Research questions and friction points this paper is trying to address.

Investigating multimodal foreign language learning through image-text pairs
Identifying features that facilitate meaning inference of unfamiliar words
Analyzing AI systems' ability to predict human learning performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal context for foreign language learning
Analyzing image-text features for word inference
AI systems reasoning about participant performance
🔎 Similar Papers
No similar papers found.
Y
Yufei Wang
Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
Adriana Kovashka
Adriana Kovashka
Associate Professor, University of Pittsburgh
Computer VisionMachine LearningArtificial Intelligence
L
Loretta Fernández
Department of Teaching, Learning, and Leading, University of Pittsburgh, Pittsburgh, PA, USA
M
Marc N. Coutanche
Department of Psychology and Learning Research and Development Center, University of Pittsburgh, Pittsburgh, PA, USA
Seth Wiener
Seth Wiener
Carnegie Mellon University
PsycholinguisticsPhoneticsSecond Language AcquisitionChinese Linguistics