Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Mainstream vision-language datasets exhibit significant cultural and socioeconomic biases, overrepresenting high-income Western contexts and consequently impairing model generalization and performance in low-income and non-Western communities. To address this, we propose a function-centered cross-cultural object modeling paradigm and introduce the first vision-language dataset spanning 46 everyday functional categories and 288 object types across diverse economic backgrounds. Built upon Dollar Street, it features human-verified, function-aligned annotations and integrates CLIP-based cross-cultural semantic analysis. We further introduce two novel evaluation metrics: functional consistency and inter-group performance disparity. Empirical results demonstrate that function-centered labeling reduces the median performance gap between high- and low-income groups by 6 percentage points, substantially improving recognition accuracy in resource-constrained settings and enhancing algorithmic fairness.

Technology Category

Application Category

📝 Abstract

Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts. We implement this framework by creating the Culture Affordance Atlas, a re-annotated and culturally grounded restructuring of the Dollar Street dataset spanning 46 functions and 288 objects publicly available at https://lit.eecs.umich.edu/CultureAffordance-Atlas/index.html. Through extensive empirical analyses using the CLIP model, we demonstrate that function-centric labels substantially reduce socioeconomic performance gaps between high- and low-income groups by a median of 6 pp (statistically significant), improving model effectiveness for lower-income contexts. Furthermore, our analyses reveals numerous culturally essential objects that are frequently overlooked in prominent VL datasets. Our contributions offer a scalable pathway toward building inclusive VL datasets and equitable AI systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses cultural bias in Vision-Language datasets

Reduces performance gaps across socioeconomic groups

Identifies overlooked culturally essential objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Function-centric framework categorizes objects across cultures

Culture Affordance Atlas restructures dataset with functional annotations

Function-centric labels reduce socioeconomic performance gaps in models

🔎 Similar Papers

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models