ColorFoil: Investigating Color Blindness in Large Vision and Language Models

📅 2024-05-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a systematic deficiency—termed “chromatic blindness”—in vision-and-language (V&L) models’ color perception: despite clear, human-discriminable color differences, models struggle to correctly identify basic colors (e.g., red/white/green). To rigorously assess this, the authors introduce ColorFoil, the first zero-shot benchmark specifically designed to evaluate color robustness in V&L models. It employs manually curated, semantically consistent image-text pairs with adversarial color perturbations (“foils”) for systematic evaluation. Experiments across major Transformer-based V&L models—including CLIP and its variants, GroupViT, ViLT, and BridgeTower—reveal that CLIP-family models and GroupViT consistently underperform human baselines on multiple color discrimination tasks, whereas ViLT and BridgeTower exhibit superior color sensitivity. This work establishes the first color-specific diagnostic benchmark for V&L models, advancing fine-grained, interpretable assessment of visual representations.

Technology Category

Application Category

📝 Abstract
With the utilization of Transformer architecture, large Vision and Language (V&L) models have shown promising performance in even zero-shot settings. Several studies, however, indicate a lack of robustness of the models when dealing with complex linguistics and visual attributes. In this work, we introduce a novel V&L benchmark - ColorFoil, by creating color-related foils to assess the models' perception ability to detect colors like red, white, green, etc. We evaluate seven state-of-the-art V&L models including CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot setting and present intriguing findings from the V&L models. The experimental evaluation indicates that ViLT and BridgeTower demonstrate much better color perception capabilities compared to CLIP and its variants and GroupViT. Moreover, CLIP-based models and GroupViT struggle to distinguish colors that are visually distinct to humans with normal color perception ability.
Problem

Research questions and friction points this paper is trying to address.

Color Blindness
Large-scale Vision-Language Models
Color Discrimination
Innovation

Methods, ideas, or system contributions that make the work stand out.

ColorFoil
Transformer Models
Color Recognition
🔎 Similar Papers
No similar papers found.
Ahnaf Mozib Samin
Ahnaf Mozib Samin
University of Malta, Msida, Malta
M
M. F. Ahmed
Shahjalal University of Science and Technology, Sylhet, Bangladesh
M
Md. Mushtaq Shahriyar Rafee
Metropolitan University, Sylhet, Bangladesh