Cultural Awareness in Vision-Language Models: A Cross-Country Exploration

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study reveals that vision-language models (VLMs) systematically encode and amplify sociocultural stereotypes related to race, gender, and body morphology across cultural contexts. To address the lack of rigorous evaluation frameworks, we propose three novel cross-cultural retrieval tasks—race–country, trait–country, and body morphology–country—and introduce the first quantitative, comparable, and interpretable benchmark for assessing cultural bias in VLMs. Our method employs a zero-shot retrieval paradigm, integrating multi-national image datasets with semantically engineered prompts, requiring neither fine-tuning nor human annotation. Empirical evaluation across leading VLMs uncovers consistent cross-national bias patterns—for instance, “Criminal” retrieving predominantly Black-associated images and “Obese” retrieving primarily U.S.-associated images—demonstrating pervasive implicit cultural biases. This work establishes a new methodological foundation and publicly available benchmark for fairness assessment in vision-language modeling.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) are increasingly deployed in diverse cultural contexts, yet their internal biases remain poorly understood. In this work, we propose a novel framework to systematically evaluate how VLMs encode cultural differences and biases related to race, gender, and physical traits across countries. We introduce three retrieval-based tasks: (1) Race to Country retrieval, which examines the association between individuals from specific racial groups (East Asian, White, Middle Eastern, Latino, South Asian, and Black) and different countries; (2) Personal Traits to Country retrieval, where images are paired with trait-based prompts (e.g., Smart, Honest, Criminal, Violent) to investigate potential stereotypical associations; and (3) Physical Characteristics to Country retrieval, focusing on visual attributes like skinny, young, obese, and old to explore how physical appearances are culturally linked to nations. Our findings reveal persistent biases in VLMs, highlighting how visual representations may inadvertently reinforce societal stereotypes.

Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural biases in Vision-Language Models across countries

Assessing racial and trait stereotypes in VLM country associations

Exploring physical appearance biases linked to nations in VLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Propose framework evaluating cultural biases in VLMs

Introduce retrieval tasks for race and traits

Reveal biases in visual representations

🔎 Similar Papers

See It from My Perspective: How Language Affects Cultural Bias in Image Understanding