A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization

πŸ“… 2025-11-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing story visualization models exhibit significant cultural bias, yielding outputs lacking cross-cultural authenticity and adaptability. To address this, we propose the first progressive multicultural evaluation framework, introducing five novel metrics and an MLLM-as-Jury automated adjudication system that enables multilingual, quantitative assessment of cultural appropriateness, visual aesthetics, and narrative coherence. Validated on the FlintstonesSV and VIST datasets via combined human evaluation and large language model–based discrimination, our method demonstrates robust effectiveness. Experimental results reveal a clear cultural stratification in model performance: English achieves highest overall fidelity; Chinese excels in narrative coherence; Hindi lags across all dimensions. The framework uncovers latent cultural biases embedded in current multilingual text-to-image models, exposing non-uniform bias distributions across linguistic and cultural groups. This work establishes a foundational methodology for diagnosing and mitigating cultural inequity in generative storytelling systems.

Technology Category

Application Category

πŸ“ Abstract
Recent advancements in text-to-image generative models have improved narrative consistency in story visualization. However, current story visualization models often overlook cultural dimensions, resulting in visuals that lack authenticity and cultural fidelity. In this study, we conduct a comprehensive multicultural analysis of story visualization using current text-to-image models across multilingual settings on two datasets: FlintstonesSV and VIST. To assess cultural dimensions rigorously, we propose a Progressive Multicultural Evaluation Framework and introduce five story visualization metrics, Cultural Appropriateness, Visual Aesthetics, Cohesion, Semantic Consistency, and Object Presence, that are not addressed by existing metrics. We further automate assessment through an MLLM-as-Jury framework that approximates human judgment. Human evaluations show that models generate more coherent, visually appealing, and culturally appropriate stories for real-world datasets than for animated ones. The generated stories exhibit a stronger alignment with English-speaking cultures across all metrics except Cohesion, where Chinese performs better. In contrast, Hindi ranks lowest on all metrics except Visual Aesthetics, reflecting real-world cultural biases embedded in current models. This multicultural analysis provides a foundation for future research aimed at generating culturally appropriate and inclusive visual stories across diverse linguistic and cultural settings.
Problem

Research questions and friction points this paper is trying to address.

Evaluates cultural biases in story visualization models
Proposes a multicultural evaluation framework with new metrics
Assesses cultural appropriateness across multilingual datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Multicultural Evaluation Framework for cultural assessment
Five new metrics measuring cultural and visual story aspects
MLLM-as-Jury framework automating evaluation with human judgment
πŸ”Ž Similar Papers
No similar papers found.
J
Janak Kapuriya
Data Science Institute, University of Galway, Galway, Ireland
A
Ali Hatami
Data Science Institute, University of Galway, Galway, Ireland
Paul Buitelaar
Paul Buitelaar
Professor in Data Analytics, Data Science Institute, Univ of Galway, Co-PI Insight Centre
Natural Language ProcessingKnowledge GraphsText MiningSemantics