🤖 AI Summary
This study investigates whether large language models possess genuine culturally grounded reasoning capabilities or merely perform superficial cultural translation. To this end, it introduces a novel probing task—metaphor generation across five distinct cultural contexts—and employs an integrated methodology combining prompt engineering, qualitative content analysis, and computational auditing to systematically evaluate the models’ cultural stance in creative writing. The findings reveal a pervasive Western-centric bias: even when explicitly prompted with specific cultural identities, the models frequently resort to stereotypical metaphors and fail to produce reasoning authentically embedded within the target culture. This work exposes fundamental limitations in current large models’ cultural understanding and establishes a new evaluation paradigm for developing culturally sensitive artificial intelligence.
📝 Abstract
Large language models (LLMs) are often described as multilingual because they can understand and respond in many languages. However, speaking a language is not the same as reasoning within a culture. This distinction motivates a critical question: do LLMs truly conduct culture-aware reasoning? This paper presents a preliminary computational audit of cultural inclusivity in a creative writing task. We empirically examine whether LLMs act as culturally diverse creative partners or merely as cultural translators that leverage a dominant conceptual framework with localized expressions. Using a metaphor generation task spanning five cultural settings and several abstract concepts as a case study, we find that the model exhibits stereotyped metaphor usage for certain settings, as well as Western defaultism. These findings suggest that merely prompting an LLM with a cultural identity does not guarantee culturally grounded reasoning.