NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models

📅 2024-04-18

📈 Citations: 4

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Current large language models (LLMs) exhibit significant deficiencies in cross-cultural social adaptability, particularly in accurately assessing social acceptability across diverse cultural contexts—whether guided by abstract values or concrete situational cues. Method: This paper introduces NormAd, the first systematic evaluation framework for quantifying LLMs’ cultural adaptability across multi-granular cultural layers—from universal values to country-specific norms. It establishes NormAd-Eti, a cross-cultural etiquette benchmark comprising 2,600 situational prompts spanning 75 countries, and employs scenario-based reasoning evaluation, multi-level prompting strategies, and human baseline comparisons. Results: Experiments reveal that even the strongest models achieve <82% accuracy under explicit normative guidance (vs. >95% for humans), plummeting to <60% when provided only with abstract values and country identifiers (vs. >90% for humans). Notably, models exhibit pronounced bias against Global South cultures, underscoring critical gaps in culturally grounded reasoning.

Technology Category

Application Category

📝 Abstract

To be effectively and safely deployed to global user populations, large language models (LLMs) may need to adapt outputs to user values and cultures, not just know about them. We introduce NormAd, an evaluation framework to assess LLMs' cultural adaptability, specifically measuring their ability to judge social acceptability across varying levels of cultural norm specificity, from abstract values to explicit social norms. As an instantiation of our framework, we create NormAd-Eti, a benchmark of 2.6k situational descriptions representing social-etiquette related cultural norms from 75 countries. Through comprehensive experiments on NormAd-Eti, we find that LLMs struggle to accurately judge social acceptability across these varying degrees of cultural contexts and show stronger adaptability to English-centric cultures over those from the Global South. Even in the simplest setting where the relevant social norms are provided, the best LLMs' performance (<82%) lags behind humans (>95%). In settings with abstract values and country information, model performance drops substantially (<60%), while human accuracy remains high (>90%). Furthermore, we find that models are better at recognizing socially acceptable versus unacceptable situations. Our findings showcase the current pitfalls in socio-cultural reasoning of LLMs which hinder their adaptability for global audiences.

Problem

Research questions and friction points this paper is trying to address.

Measure LLMs' cultural adaptability

Assess social acceptability across cultures

Evaluate models' performance in global contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cultural adaptability assessment framework

Benchmark for global social norms

Evaluation of LLMs' socio-cultural reasoning

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning