Machine Psychophysics: Cognitive Control in Vision-Language Models

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study investigates whether vision-language models (VLMs) exhibit human-like cognitive control—specifically, goal prioritization and interference suppression under conflict. Grounded in classic paradigms (Stroop, Flanker, Simon), we introduce the first psychophysical evaluation protocol tailored to multimodal foundation models, encompassing 108 models, 2,220 trials, and high-difficulty variants. Methodologically, we pioneer the adaptation of human executive function assessment protocols to VLMs, integrating quantitative behavioral analysis with cross-model comparison. Results demonstrate that, under resource constraints, VLMs display human-like executive function patterns and substantial inter-model variability; state-of-the-art models effectively suppress distractors and amplify target responses, exhibiting behavioral signatures highly consistent with human performance. This work establishes a novel, empirically grounded paradigm for modeling VLM cognition and evaluating controllability.

Technology Category

Application Category

📝 Abstract

Cognitive control refers to the ability to flexibly coordinate thought and action in pursuit of internal goals. A standard method for assessing cognitive control involves conflict tasks that contrast congruent and incongruent trials, measuring the ability to prioritize relevant information while suppressing interference. We evaluate 108 vision-language models on three classic conflict tasks and their more demanding"squared"variants across 2,220 trials. Model performance corresponds closely to human behavior under resource constraints and reveals individual differences. These results indicate that some form of human-like executive function have emerged in current multi-modal foundational models.

Problem

Research questions and friction points this paper is trying to address.

Assessing cognitive control in vision-language models

Comparing model performance to human behavior

Evaluating emergence of human-like executive functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates cognitive control in vision-language models

Uses classic and squared conflict tasks

Reveals human-like executive function emergence

🔎 Similar Papers

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions