🤖 AI Summary
Large language models (LLMs) exhibit significant deficits in non-English cultural competence due to English-centric training data, with Persian culture notably underassessed. Method: We introduce PerCul—the first narrative-style, multiple-choice benchmark for evaluating LLMs’ understanding of Persian culture—designed and annotated collaboratively by native Persian speakers to ensure cultural authenticity and eliminate translation bias. PerCul integrates cultural modeling, multi-round expert annotation, and a comparative multi-model evaluation framework. Contribution/Results: PerCul is the first systematic assessment revealing substantial gaps in LLMs’ Persian cultural understanding: the best closed-source model underperforms native Persian speakers by 11.3% in accuracy, while the top open-source model lags by 21.3%. This work establishes a rigorous, culturally grounded evaluation paradigm and fills a critical gap in non-English cultural capability assessment.
📝 Abstract
Large language models predominantly reflect Western cultures, largely due to the dominance of English-centric training data. This imbalance presents a significant challenge, as LLMs are increasingly used across diverse contexts without adequate evaluation of their cultural competence in non-English languages, including Persian. To address this gap, we introduce PerCul, a carefully constructed dataset designed to assess the sensitivity of LLMs toward Persian culture. PerCul features story-based, multiple-choice questions that capture culturally nuanced scenarios. Unlike existing benchmarks, PerCul is curated with input from native Persian annotators to ensure authenticity and to prevent the use of translation as a shortcut. We evaluate several state-of-the-art multilingual and Persian-specific LLMs, establishing a foundation for future research in cross-cultural NLP evaluation. Our experiments demonstrate a 11.3% gap between best closed source model and layperson baseline while the gap increases to 21.3% by using the best open-weight model. You can access the dataset from here: https://huggingface.co/datasets/teias-ai/percul