🤖 AI Summary
Existing large language models (LLMs) exhibit limited performance on cross-cultural multimodal tasks due to Western-centric data curation and modeling paradigms. To address this, we propose MosAIG—the first culture-persona-driven multi-agent image generation framework—where multiple LLMs, each endowed with distinct cultural identities (spanning five countries, three generations, two genders, twenty-five landmarks, and five languages), collaboratively generate culturally contextualized images. Our key contributions are: (1) a culture-persona-guided multi-role collaborative reasoning mechanism; (2) Multicultural, the first 9,000-sample cross-cultural image dataset; and (3) a novel generation paradigm integrating cross-modal alignment with culture-aware prompt engineering. Experiments demonstrate that MosAIG significantly outperforms single-model baselines across cultural consistency, image fidelity, and semantic alignment metrics. All models and datasets are publicly released to advance equitable and inclusive multimodal AI research.
📝 Abstract
Large Language Models (LLMs) demonstrate impressive performance across various multimodal tasks. However, their effectiveness in cross-cultural contexts remains limited due to the predominantly Western-centric nature of existing data and models. Meanwhile, multi-agent models have shown strong capabilities in solving complex tasks. In this paper, we evaluate the performance of LLMs in a multi-agent interaction setting for the novel task of multicultural image generation. Our key contributions are: (1) We introduce MosAIG, a Multi-Agent framework that enhances multicultural Image Generation by leveraging LLMs with distinct cultural personas; (2) We provide a dataset of 9,000 multicultural images spanning five countries, three age groups, two genders, 25 historical landmarks, and five languages; and (3) We demonstrate that multi-agent interactions outperform simple, no-agent models across multiple evaluation metrics, offering valuable insights for future research. Our dataset and models are available at https://github.com/OanaIgnat/MosAIG.