A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models

📅 2024-02-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the lack of systematic evaluation of higher-order cognitive capabilities in Large Vision-Language Models (LVLMs). To this end, we introduce CogniBench—the first benchmark specifically designed to assess higher-order cognition in LVLMs. Inspired by the clinical “Cookie Theft” neuropsychological test, CogniBench comprises 251 semantically rich images and pioneers the adaptation of human clinical cognitive assessment paradigms to LVLM evaluation. We formally define eight cognitive dimensions—including abstract reasoning and causal inference—and establish a dual-track, fine-grained annotation framework covering image captioning and visual question answering. Experimental results reveal that state-of-the-art LVLMs achieve accuracies below 45% on abstract reasoning and causal inference tasks—substantially underperforming human baselines—thereby exposing fundamental limitations in their higher-order cognitive reasoning abilities.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the Cookie Theft task in human cognitive tests, we propose a novel evaluation benchmark to evaluate high-level cognitive abilities of LVLMs using images with rich semantics. The benchmark consists of 251 images along with comprehensive annotations. It defines eight reasoning capabilities and comprises an image description task and a visual question answering task. Our evaluation of well-known LVLMs shows that there is still a significant gap in cognitive abilities between LVLMs and humans.

Problem

Research questions and friction points this paper is trying to address.

Evaluate cognitive abilities of LVLMs

Propose novel benchmark using rich semantic images

Identify gap between LVLMs and human cognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cognitive evaluation benchmark for LVLMs

Rich semantic images for testing

Eight reasoning capabilities defined

🔎 Similar Papers

What is the Visual Cognition Gap between Humans and Multimodal LLMs?