CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Current text-to-image models lack reliable, quantifiable evaluation methods for cross-cultural fairness, hindering rigorous assessment of cultural bias in diverse sociocultural contexts. To address this, we propose the first fully automated, quantifiable evaluation framework that innovatively integrates knowledge graph–enhanced entity grounding, a multicultural independent scoring paradigm, and fine-grained cultural label–based discrimination. We further introduce LLM-assisted construction of culturally sensitive benchmarks and Likert-scale alignment modeling to ensure semantic and perceptual fidelity. Unlike conventional metrics, our framework supports dual-scenario evaluation—covering both rare cultural concepts and general visual categories. On a manually curated dataset, our method achieves an F1 score 28 percentage points higher than baseline approaches. Moreover, on synthetic image and real-image benchmark sets, it attains Pearson correlation coefficients of 0.56 and 0.66 with human judgments, respectively—demonstrating strong alignment with expert cultural assessments.

Technology Category

Application Category

📝 Abstract

As text-to-image models become increasingly prevalent, ensuring their equitable performance across diverse cultural contexts is critical. Efforts to mitigate cross-cultural biases have been hampered by trade-offs, including a loss in performance, factual inaccuracies, or offensive outputs. Despite widespread recognition of these challenges, an inability to reliably measure these biases has stalled progress. To address this gap, we introduce CAIRe, a novel evaluation metric that assesses the degree of cultural relevance of an image, given a user-defined set of labels. Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label. On a manually curated dataset of culturally salient but rare items built using language models, CAIRe surpasses all baselines by 28% F1 points. Additionally, we construct two datasets for culturally universal concept, one comprising of T2I-generated outputs and another retrieved from naturally occurring data. CAIRe achieves Pearson's correlations of 0.56 and 0.66 with human ratings on these sets, based on a 5-point Likert scale of cultural relevance. This demonstrates its strong alignment with human judgment across diverse image sources.

Problem

Research questions and friction points this paper is trying to address.

Measure cultural bias in text-to-image models

Assess cultural relevance of generated images

Improve evaluation of cross-cultural performance gaps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented evaluation for cultural relevance

Knowledge-based grounding of image entities

Graded judgments using factual information

🔎 Similar Papers

How Culturally Aware are Vision-Language Models?