Quantification and object perception in Multimodal Large Language Models deviate from human linguistic cognition

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study investigates systematic cognitive biases in multimodal large language models (MLLMs) regarding quantifier representation, focusing on three cross-linguistically robust features of human quantitative cognition: quantifier ordering scales, usage scope/typicality, and Approximate Number System (ANS)-related biases. Method: Leveraging psycholinguistic paradigms and multilingual comparative experiments, we design a quantification comprehension evaluation task—introducing, for the first time, a three-dimensional human cognitive framework into MLLM assessment. Contribution/Results: We find that MLLMs exhibit systematic deviations in both semantic and pragmatic quantifier representations. These biases are significantly modulated by model architecture (e.g., presence of visual grounding capabilities) and linguistic typology (e.g., classifier richness, numeral system properties). Our findings reveal fundamental limitations in current MLLMs’ numerical reasoning and provide both theoretical grounding and empirical benchmarks for developing cognitively aligned multimodal quantification models.

Technology Category

Application Category

📝 Abstract

Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact reasons for the poor performance are still unclear. This papers looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature: the ordering of quantifiers into scales, the ranges of use and prototypicality, and the biases inherent in the human approximate number system. The aim is to determine how these features are encoded in the models'architecture, how they may differ from humans, and whether the results are affected by the type of model and language under investigation. We find that there are clear differences between humans and MLLMs with respect to these features across various tasks that tap into the representation of quantification in vivo vs. in silico. This work, thus, paves the way for addressing the nature of MLLMs as semantic and pragmatic agents, while the cross-linguistic lens can elucidate whether their abilities are robust and stable across different languages.

Problem

Research questions and friction points this paper is trying to address.

Investigates MLLMs' deviation from human quantification cognition across languages

Analyzes scaling, prototypicality and numerical biases in multimodal language models

Examines cross-linguistic robustness of semantic representations in AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes quantifier scales and prototypicality in models

Compares human vs model quantification across languages

Examines architectural encoding of semantic biases

🔎 Similar Papers

Human-like object concept representations emerge naturally in multimodal large language models