Core Knowledge Deficits in Multi-Modal Language Models

📅 2024-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies systematic deficiencies in multimodal large language models (MLLMs) regarding fundamental human cognitive capacities—such as object permanence and causal intuition—revealing markedly lower robustness and efficiency than human cognition. To address this, we introduce CoreCognition, the first large-scale benchmark grounded in developmental cognitive science, covering 12 foundational concepts and evaluating 219 MLLMs. We propose Concept Hacking: a novel methodology integrating concept-level perturbations, attribution analysis, and multi-prompt engineering. For the first time, it empirically demonstrates that MLLMs predominantly rely on statistical shortcuts rather than genuine conceptual understanding. Results show negligible scaling effects for low-level cognitive abilities; while high-level reasoning approaches human performance, early-stage cognition lags substantially. Analysis of 2,409 data points confirms the pervasive prevalence of “illusory understanding” across models.

Technology Category

Application Category

📝 Abstract
While Multimodal Large Language Models (MLLMs) demonstrate impressive abilities over high level perception and reasoning, their robustness in the wild still lags behind humans and exhibits diminished efficacy on simple tasks that are intuitive for humans. We examine the hypothesis that these deficiencies stem from the absence of core knowledge, rudimentary cognitive abilities innate to humans from early childhood. To probe core knowledge representation in MLLMs, we draw from developmental cognitive sciences and develop a large-scale benchmark, CoreCognition dataset, encompassing 12 core cognitive concepts. We evaluate 219 models with 10 different prompts, leading to a total of 2409 data points for analysis. Our findings reveal core knowledge deficits in early developed core abilities while models demonstrate human comparable performance in high level cognition. Moreover, we find that low level abilities show little to no scaling, in stark contrast to high level abilities. Finally, we introduce an evaluation technique, Concept Hacking, through which we demonstrate that MLLMs do not genuinely advance toward core knowledge but instead rely on illusory understanding and shortcut learning as they scale. Website with this $href{https://growing-ai-like-a-child.github.io/}{link}$.
Problem

Research questions and friction points this paper is trying to address.

Examines core knowledge deficits in Multimodal Large Language Models.
Develops CoreCognition dataset to evaluate core cognitive concepts.
Introduces Concept Hacking to reveal illusory understanding in scaling.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed CoreCognition dataset for evaluation
Introduced Concept Hacking evaluation technique
Analyzed 219 models with 10 prompts
🔎 Similar Papers
No similar papers found.
Yijiang Li
Yijiang Li
Argonne National Laboratory
Q
Qingying Gao
University of California San Diego
H
Haoran Sun
University of North Carolina at Chapel Hill
H
Haiyun Lyu
University of Michigan
Dezhi Luo
Dezhi Luo
University of Michigan
cognitive sciencephilosophyAI
Hokin Deng
Hokin Deng
Johns Hopkins University
cognition