VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior research on vision-language model (VLM) bias has narrowly focused on gender–occupation associations, overlooking multidimensional, context-sensitive social stereotypes and their systemic impacts across factual accuracy, perceptual interpretation, stereotyping, and downstream decision-making. Method: We introduce VIGNETTE, the first large-scale VQA benchmark comprising over 30 million images, grounded in social psychological theory to enable interpretable evaluation of how VLMs encode social hierarchies and pre-assign capabilities based on visual identity cues. Our approach innovatively integrates social cognitive modeling, bias-sensitive prompt engineering, and statistically rigorous significance testing. Contribution/Results: Experiments uncover counterintuitive stereotypic patterns—including cross-identity trait attribution biases and implicit role assignment tendencies—challenging conventional narrow paradigms. VIGNETTE establishes the first open-source, four-dimensional bias assessment framework for trustworthy multimodal AI, spanning factual, perceptual, stereotypic, and decisional dimensions.

Technology Category

Application Category

📝 Abstract
While bias in large language models (LLMs) is well-studied, similar concerns in vision-language models (VLMs) have received comparatively less attention. Existing VLM bias studies often focus on portrait-style images and gender-occupation associations, overlooking broader and more complex social stereotypes and their implied harm. This work introduces VIGNETTE, a large-scale VQA benchmark with 30M+ images for evaluating bias in VLMs through a question-answering framework spanning four directions: factuality, perception, stereotyping, and decision making. Beyond narrowly-centered studies, we assess how VLMs interpret identities in contextualized settings, revealing how models make trait and capability assumptions and exhibit patterns of discrimination. Drawing from social psychology, we examine how VLMs connect visual identity cues to trait and role-based inferences, encoding social hierarchies, through biased selections. Our findings uncover subtle, multifaceted, and surprising stereotypical patterns, offering insights into how VLMs construct social meaning from inputs.
Problem

Research questions and friction points this paper is trying to address.

Evaluating bias in vision-language models beyond gender-occupation stereotypes
Assessing how VLMs interpret identities in contextualized social settings
Uncovering subtle and multifaceted stereotypical patterns in VLM outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale VQA benchmark for VLM bias evaluation
Contextualized identity interpretation assessment
Social psychology-based trait and role inference analysis
🔎 Similar Papers
No similar papers found.
Chahat Raj
Chahat Raj
George Mason University
NLPFairnessEthicsSociety & Culture
B
Bowen Wei
George Mason University
Aylin Caliskan
Aylin Caliskan
Assistant Professor, University of Washington
AI biasAI ethicsmachine learningnatural language processingtech policy
A
Antonios Anastasopoulos
George Mason University, Archimedes, Athena Research Center, Greece
Z
Ziwei Zhu
George Mason University