Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study systematically uncovers facial-representation-based gender and racial stereotypes in vision-language models (VLMs), overcoming limitations of prior bias research—namely, overreliance on trait associations, constrained evaluation settings, and binary identity assumptions. We propose the first integrated VLM bias analysis framework that jointly quantifies trait associations and homogenization bias, employing a standardized face image dataset, open-ended story generation across four mainstream VLMs, prototype-based typicality measurement, and cross-model consistency validation. Results reveal: (1) VLMs generate significantly more homogeneous narratives for women and White Americans; (2) facial gender typicality amplifies homogenization, whereas racial typicality does not; (3) “basketball” is the sole trait association consistently linked to Black identity across all models. Critically, we provide the first empirical evidence that stereotypes manifest implicitly through imbalanced generative diversity—not merely via explicit attribute binding.

Technology Category

Application Category

📝 Abstract

Current research on bias in Vision Language Models (VLMs) has important limitations: it is focused exclusively on trait associations while ignoring other forms of stereotyping, it examines specific contexts where biases are expected to appear, and it conceptualizes social categories like race and gender as binary, ignoring the multifaceted nature of these identities. Using standardized facial images that vary in prototypicality, we test four VLMs for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. By contrast, VLMs represent White Americans more uniformly than Black Americans. Unlike with gender prototypicality, race prototypicality was not related to stronger uniformity. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM. These findings demonstrate that VLM stereotyping manifests in ways that go beyond simple group membership, suggesting that conventional bias mitigation strategies may be insufficient to address VLM stereotyping and that homogeneity bias persists even when trait associations are less apparent in model outputs.

Problem

Research questions and friction points this paper is trying to address.

Addresses stereotyping in Vision-Language Models beyond trait associations.

Explores homogeneity bias in VLMs using gender and race prototypicality.

Highlights limitations of current bias mitigation strategies in VLMs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized facial images test VLMs for biases.

VLMs show homogeneity bias in gender and race.

Conventional bias mitigation strategies may be insufficient.

🔎 Similar Papers

No similar papers found.