Probing Perceptual Constancy in Large Vision Language Models

๐Ÿ“… 2025-02-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates whether large vision-language models (VLMs) possess human-level perceptual constancyโ€”the ability to stably recognize object color, size, and shape under sensory variations such as illumination, viewpoint, and distance. We systematically evaluate 33 state-of-the-art VLMs across 253 controlled experiments grounded in classical psychological paradigms, spanning single images, videos, and real-world scenes, and introduce PerCon: the first cross-modal perceptual constancy benchmark. Results reveal a pronounced dissociation among the three constancy types, with shape constancy being markedly weakest (best model accuracy: 68.3%, substantially below human performance); substantial inter-model variability further exposes fundamental deficiencies in underlying perceptual robustness. This study provides the first quantitative characterization of structural limitations in VLM perceptual constancy, establishing a novel evaluation framework and actionable directions for advancing trustworthy visual understanding.

Technology Category

Application Category

๐Ÿ“ Abstract
Perceptual constancy is the ability to maintain stable perceptions of objects despite changes in sensory input, such as variations in distance, angle, or lighting. This ability is crucial for recognizing visual information in a dynamic world, making it essential for Vision-Language Models (VLMs). However, whether VLMs are currently and theoretically capable of mastering this ability remains underexplored. In this study, we evaluated 33 VLMs using 253 experiments across three domains: color, size, and shape constancy. The experiments included single-image and video adaptations of classic cognitive tasks, along with novel tasks in in-the-wild conditions, to evaluate the models' recognition of object properties under varying conditions. We found significant variability in VLM performance, with models performance in shape constancy clearly dissociated from that of color and size constancy.
Problem

Research questions and friction points this paper is trying to address.

Evaluate perceptual constancy in Vision-Language Models
Assess VLMs' ability to recognize object properties
Identify variability in performance across constancy domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated 33 VLMs across domains
Used 253 experiments with variations
Assessed constancy in dynamic conditions
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Haoran Sun
Johns Hopkins University, University of California, San Diego
S
Suyang Yu
Johns Hopkins University, University of California, San Diego
Yijiang Li
Yijiang Li
Argonne National Laboratory
Q
Qingying Gao
Johns Hopkins University, University of California, San Diego
H
Haiyun Lyu
University of North Carolina at Chapel Hill, Carnegie Mellon University
Hokin Deng
Hokin Deng
Johns Hopkins University
cognition
Dezhi Luo
Dezhi Luo
University of Michigan
cognitive sciencephilosophyAI