π€ AI Summary
This study addresses the frequent generation of unsafe or culturally inappropriate content by generative AI in cross-cultural interactions, stemming from insufficient modeling and evaluation of social norms. To bridge this gap, the work proposes the first multidimensional taxonomy that integrates normative context, domain specificity, and enforcement mechanisms, distinguishing between interpersonal norms and humanβAI interaction norms. Building upon this framework, the authors develop a pipeline grounded in norm ontology modeling, natural language understanding, and automated evaluation to enable context-sensitive detection of norm violations in open-domain dialogue. Empirical analysis reveals that mainstream large language models commonly violate cultural norms, with violation rates significantly influenced by model architecture, country of origin, interaction context, and prompt intent.
π Abstract
Generative AI models ought to be useful and safe across cross-cultural contexts. One critical step toward this goal is understanding how AI models adhere to sociocultural norms. While this challenge has gained attention in NLP, existing work lacks both nuance and coverage in understanding and evaluating models'norm adherence. We address these gaps by introducing a taxonomy of norms that clarifies their contexts (e.g., distinguishing between human-human norms that models should recognize and human-AI interactional norms that apply to the human-AI interaction itself), specifications (e.g., relevant domains), and mechanisms (e.g., modes of enforcement). We demonstrate how our taxonomy can be operationalized to automatically evaluate models'norm adherence in naturalistic, open-ended settings. Our exploratory analyses suggest that state-of-the-art models frequently violate norms, though violation rates vary by model, interactional context, and country. We further show that violation rates also vary by prompt intent and situational framing. Our taxonomy and demonstrative evaluation pipeline enable nuanced, context-sensitive evaluation of cultural norm adherence in realistic settings.