LLM-Human Pipeline for Cultural Context Grounding of Conversations

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
NLP models frequently misinterpret social norms in cross-cultural dialogues due to insufficient cultural context modeling. To address this, we propose the *Cultural Context Schema*—the first dual-track symbolic framework jointly modeling dialogue acts, affective states, social norms, and their violation patterns. We design a human-in-the-loop *Norm Concept* structuring methodology, integrating LLM generation with expert refinement to construct a high-quality Chinese cultural norm knowledge base containing 110,000 entries. An automated validation–human calibration feedback loop ensures reliability, and we establish a culture-aware evaluation protocol. Leveraging this framework, we build the first large-scale Chinese cultural context dataset (23K dialogues). Empirical results demonstrate significant improvements in emotion recognition, sentiment classification, and dialogue act prediction. Our work provides an interpretable, scalable paradigm for developing culturally sensitive dialogue systems.

Technology Category

Application Category

📝 Abstract
Conversations often adhere to well-understood social norms that vary across cultures. For example, while"addressing parents by name"is commonplace in the West, it is rare in most Asian cultures. Adherence or violation of such norms often dictates the tenor of conversations. Humans are able to navigate social situations requiring cultural awareness quite adeptly. However, it is a hard task for NLP models. In this paper, we tackle this problem by introducing a"Cultural Context Schema"for conversations. It comprises (1) conversational information such as emotions, dialogue acts, etc., and (2) cultural information such as social norms, violations, etc. We generate ~110k social norm and violation descriptions for ~23k conversations from Chinese culture using LLMs. We refine them using automated verification strategies which are evaluated against culturally aware human judgements. We organize these descriptions into meaningful structures we call"Norm Concepts", using an interactive human-in-loop framework. We ground the norm concepts and the descriptions in conversations using symbolic annotation. Finally, we use the obtained dataset for downstream tasks such as emotion, sentiment, and dialogue act detection. We show that it significantly improves the empirical performance.
Problem

Research questions and friction points this paper is trying to address.

Addressing cultural norm variations in NLP conversations
Automating cultural context annotation using LLMs and humans
Improving dialogue tasks with culturally grounded datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated cultural norm descriptions
Automated verification with human feedback
Symbolic annotation for conversation grounding