Cultural Awareness in Vision-Language Models: A Cross-Country Exploration

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study reveals that vision-language models (VLMs) systematically encode and amplify sociocultural stereotypes related to race, gender, and body morphology across cultural contexts. To address the lack of rigorous evaluation frameworks, we propose three novel cross-cultural retrieval tasks—race–country, trait–country, and body morphology–country—and introduce the first quantitative, comparable, and interpretable benchmark for assessing cultural bias in VLMs. Our method employs a zero-shot retrieval paradigm, integrating multi-national image datasets with semantically engineered prompts, requiring neither fine-tuning nor human annotation. Empirical evaluation across leading VLMs uncovers consistent cross-national bias patterns—for instance, “Criminal” retrieving predominantly Black-associated images and “Obese” retrieving primarily U.S.-associated images—demonstrating pervasive implicit cultural biases. This work establishes a new methodological foundation and publicly available benchmark for fairness assessment in vision-language modeling.

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs) are increasingly deployed in diverse cultural contexts, yet their internal biases remain poorly understood. In this work, we propose a novel framework to systematically evaluate how VLMs encode cultural differences and biases related to race, gender, and physical traits across countries. We introduce three retrieval-based tasks: (1) Race to Country retrieval, which examines the association between individuals from specific racial groups (East Asian, White, Middle Eastern, Latino, South Asian, and Black) and different countries; (2) Personal Traits to Country retrieval, where images are paired with trait-based prompts (e.g., Smart, Honest, Criminal, Violent) to investigate potential stereotypical associations; and (3) Physical Characteristics to Country retrieval, focusing on visual attributes like skinny, young, obese, and old to explore how physical appearances are culturally linked to nations. Our findings reveal persistent biases in VLMs, highlighting how visual representations may inadvertently reinforce societal stereotypes.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural biases in Vision-Language Models across countries
Assessing racial and trait stereotypes in VLM country associations
Exploring physical appearance biases linked to nations in VLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Propose framework evaluating cultural biases in VLMs
Introduce retrieval tasks for race and traits
Reveal biases in visual representations
🔎 Similar Papers
No similar papers found.