🤖 AI Summary
This study investigates whether large language models can automatically adapt their linguistic expressions to implicit cultural contexts without explicit instructions. Through 60 cross-cultural dialogue scenarios across five languages, the authors compare model responses under neutral prompts, explicit cultural directives, and implicit contextual cues. They introduce the Pragmatic Context Sensitivity (PCS) metric—the first systematic measure quantifying a model’s ability to reproduce explicitly instructed pragmatic behaviors—and employ Hindi and Urdu as a natural control pair. Combining multilingual model evaluation, annotation of 12 pragmatic dimensions, and statistical significance testing, the experiments reveal that models recover only 19.6% of explicit pragmatic variation on average, showing highest sensitivity to authority-related cues (0.299) and weakest responsiveness to collectivism–individualism framing (0.120), with linguistic structure exerting a stronger influence than cultural affinity.
📝 Abstract
Many benchmarks show that large language models can answer direct questions about culture. We study a different question: do they also change how they speak when culture is only implied by the situation? We evaluate 60 culturally grounded conversational scenarios across five languages in three conditions: a neutral baseline (Prompt A), an explicit cultural instruction (Prompt B), and implicit situational cueing (Prompt C). We score responses on 12 pragmatic features covering deference to authority, individual-versus-group framing, and uncertainty management. We define Pragmatic Context Sensitivity (PCS) as the fraction of the Prompt A->B shift that reappears under Prompt A->C. Across four deployed LLMs and five languages (English, German, Hindi, Nepali, Urdu), the primary stable-only PCS mean is 0.196 (SD = 0.113), indicating that the models recover only about one-fifth of the pragmatic shift they can produce when instructed explicitly. Transfer is strongest for authority-related cues (0.299) and weakest for individual-versus-group framing (0.120). Uncertainty-related behaviour is mixed: hedging density exhibits negative explicit gaps in all five languages, suggesting that alignment training actively suppresses the target behaviour. Because Hindi and Urdu share core grammar yet index distinct cultural communities, we use them as a natural control; a paired analysis finds no reliable baseline difference (t = 0.96, p = 0.339, dz = 0.06), suggesting that models respond primarily to linguistic structure rather than to the cultural associations a language carries. We argue that multilingual cultural pragmatics is an explicit-versus-implicit deployment problem, not only a factual knowledge problem.