ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the robustness evaluation of vision-language models (VLMs) under color vision deficiencies. We propose ColorBlindnessEval—the first adversarial benchmark for VLMs inspired by the Ishihara color blindness test. It comprises 500 Ishihara-like images, each embedding digits 0–99 within chromatically complex patterns, and employs both binary (Yes/No) and open-ended question-answering prompts to systematically assess digit recognition across nine state-of-the-art VLMs, with human performance as reference. By adapting clinical color-vision testing paradigms to VLM evaluation, we uncover— for the first time—that VLMs suffer severe accuracy degradation and exhibit pervasive textual hallucinations under chromatic confusion. Experiments reveal that current VLMs heavily rely on superficial texture and contextual cues, lacking intrinsic color-invariant visual perception. These findings underscore the necessity of developing visually robust VLMs and highlight the value of color-robustness as a novel, clinically grounded evaluation dimension.

Technology Category

Application Category

📝 Abstract
This paper presents ColorBlindnessEval, a novel benchmark designed to evaluate the robustness of Vision-Language Models (VLMs) in visually adversarial scenarios inspired by the Ishihara color blindness test. Our dataset comprises 500 Ishihara-like images featuring numbers from 0 to 99 with varying color combinations, challenging VLMs to accurately recognize numerical information embedded in complex visual patterns. We assess 9 VLMs using Yes/No and open-ended prompts and compare their performance with human participants. Our experiments reveal limitations in the models' ability to interpret numbers in adversarial contexts, highlighting prevalent hallucination issues. These findings underscore the need to improve the robustness of VLMs in complex visual environments. ColorBlindnessEval serves as a valuable tool for benchmarking and improving the reliability of VLMs in real-world applications where accuracy is critical.
Problem

Research questions and friction points this paper is trying to address.

Evaluating VLM robustness in color blindness test scenarios
Assessing numerical recognition in adversarial visual patterns
Identifying hallucination issues in complex visual environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark tests VLMs with Ishihara-like color patterns
Evaluates numerical recognition using adversarial color combinations
Identifies hallucination issues through Yes/No and open-ended prompts
🔎 Similar Papers
No similar papers found.
Z
Zijian Ling
Apply U, United Kingdom
H
Han Zhang
Apply U, United Kingdom
Y
Yazhuo Zhou
Apply U, United Kingdom
Jiahao Cui
Jiahao Cui
Huazhong University of Science and Technology
Computer visionDeep LearningComputational Photography