Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit entity-centric cultural biases in Asian languages and how such biases affect downstream task performance. To this end, we introduce Camellia—the first fine-grained, entity-level, multilingual cultural evaluation benchmark tailored to Asian linguistic and cultural contexts—covering nine languages, 19,530 culturally annotated entities, and 2,173 social-media-derived contextual instances. We conduct multidimensional evaluations via masked token prediction, cross-cultural adaptation, sentiment polarity analysis, and extractive question answering to assess LLMs’ cultural adaptability, sentiment association, and entity extraction capabilities. Results reveal pervasive cultural adaptation deficits across state-of-the-art LLMs, with performance strongly correlated with the geographic distribution of training data and significant cross-cultural disparities in entity understanding. Camellia provides a reproducible, scalable benchmark to advance research on cultural fairness in Asian-language LLMs.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) gain stronger multilingual capabilities, their ability to handle culturally diverse entities becomes crucial. Prior work has shown that LLMs often favor Western-associated entities in Arabic, raising concerns about cultural fairness. Due to the lack of multilingual benchmarks, it remains unclear if such biases also manifest in different non-Western languages. In this paper, we introduce Camellia, a benchmark for measuring entity-centric cultural biases in nine Asian languages spanning six distinct Asian cultures. Camellia includes 19,530 entities manually annotated for association with the specific Asian or Western culture, as well as 2,173 naturally occurring masked contexts for entities derived from social media posts. Using Camellia, we evaluate cultural biases in four recent multilingual LLM families across various tasks such as cultural context adaptation, sentiment association, and entity extractive QA. Our analyses show a struggle by LLMs at cultural adaptation in all Asian languages, with performance differing across models developed in regions with varying access to culturally-relevant data. We further observe that different LLM families hold their distinct biases, differing in how they associate cultures with particular sentiments. Lastly, we find that LLMs struggle with context understanding in Asian languages, creating performance gaps between cultures in entity extraction.
Problem

Research questions and friction points this paper is trying to address.

Measuring cultural biases in multilingual LLMs for Asian languages
Evaluating entity-centric cultural adaptation across diverse Asian cultures
Assessing context understanding gaps in entity extraction tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for cultural bias in Asian languages
Manually annotated entities from social media contexts
Evaluates multilingual LLMs across diverse cultural tasks
Tarek Naous
Tarek Naous
Georgia Institute of Technology
Machine LearningNatural Language ProcessingLanguage ModelsArtificial Intelligence
A
Anagha Savit
Georgia Institute of Technology
C
Carlos Rafael Catalan
Samsung R&D Institute Philippines
G
Geyang Guo
Georgia Institute of Technology
J
Jaehyeok Lee
Sungkyunkwan University
K
Kyungdon Lee
Sungkyunkwan University
L
Lheane Marie Dizon
Samsung R&D Institute Philippines
M
Mengyu Ye
Tohoku University
N
Neel Kothari
Georgia Institute of Technology
Sahajpreet Singh
Sahajpreet Singh
Ph.D. Student, National University of Singapore
NLPComputational Social ScienceComputational PoliticsMisinformationBiases
Sarah Masud
Sarah Masud
University of Copenhagen
Social Computing
T
Tanish Patwa
Georgia Institute of Technology
T
Trung Thanh Tran
Takenote.ai
Z
Zohaib Khan
University of Michigan
Alan Ritter
Alan Ritter
Georgia Institute of Technology
Natural Language ProcessingMachine LearningArtificial IntelligenceInformation Extraction
JinYeong Bak
JinYeong Bak
College of Computing, Sungkyunkwan University
Artificial IntelligenceConversation Modeling
Keisuke Sakaguchi
Keisuke Sakaguchi
Tohoku University
Natural Language ProcessingMachine LearningPsycholinguistics
T
Tanmoy Chakraborty
Indian Institute of Technology Delhi
Y
Yuki Arase
Institute of Science Tokyo
W
Wei Xu
Georgia Institute of Technology