🤖 AI Summary
This study addresses the challenge that large language models (LLMs) struggle to capture cultural nuances in expressing social emotions, often leading to culturally inappropriate behaviors in cross-cultural interactions. Proposing the first evaluation framework grounded in cultural psychology, the authors systematically assess six prominent LLMs’ alignment with engagement- and disengagement-oriented emotional expressions typical of European Americans and Latin Americans. Through human benchmarking experiments, a cross-cultural emotion taxonomy, multilingual prompting, and temperature ablation analyses, they find that all models consistently overexpress engagement-oriented emotions—particularly when portraying European American personas—and produce outputs markedly less diverse than human responses. This bias proves robust to sampling temperature variations and is partially modulated by prompt language. The work exposes critical limitations in current LLMs’ cultural emotion modeling and establishes a quantitative benchmark for improving cross-cultural alignment.
📝 Abstract
The expression of emotions that serve social purposes, such as asserting independence or fostering interdependence, is central to human interactions and varies systematically across cultures. As LLMs are increasingly used to simulate human behavior in culturally nuanced interactions, it is important to understand whether they faithfully capture human patterns of social emotion expression. When LLM responses are not culturally aligned, their utility is compromised -- particularly when users assume they are interacting with a culturally attuned interlocutor, and may act on advice that proves inappropriate in their cultural context. We present a psychologically informed evaluation framework of cross-cultural social emotion expression in LLMs. Using a human study comparing European American and Latin American participants' expression of engaging and disengaging emotions, we evaluate six frontier LLMs on their ability to reflect culturally differentiated patterns for expressing social emotions. We find systematic misalignment between model and human behavior: all models express engaging emotions more than disengaging ones, with particularly stark differences observed for the generally well-represented European American persona. We further highlight that LLM responses are highly concentrated and deterministic, failing to capture the diversity of human responses in expressing social emotions. Our ablation analyses reveal that these patterns are robust to sampling temperatures, partially sensitive to prompt language, and dependent on the response elicitation format. Together, our findings highlight limitations in how current LLMs represent the interaction of cultural and emotional axes, particularly when expressing social emotions, with direct implications for their deployment in cross-cultural affective contexts.