🤖 AI Summary
Existing emotion recognition resources suffer from poor out-of-domain generalization, coarse-grained annotations, and exclusive reliance on post-hoc detection—limiting their applicability in emotion-aware customer service. To address these limitations, this study proposes a controlled Wizard-of-Oz experimental paradigm to construct EmoWOZ-CS, a multilingual, multi-label dialogue dataset targeting commercial aviation and e-commerce domains. It is the first to jointly integrate the valence-arousal-dominance (VAD) dimensional model with fine-grained categorical labels and to establish a multi-turn temporal annotation protocol. Innovatively, it introduces an operator-guided emotional trajectory control mechanism enabling real-time emotion prediction and intervention strategy analysis. The dataset comprises 2,148 bilingual dialogues, revealing empirical patterns: neutrality dominance, significant emotional reciprocity, and superior efficacy of positive guidance in negative contexts. It further provides the first systematic validation of systematic discrepancies between self-reports and third-party annotations, as well as inherent challenges in forward-looking emotional inference.
📝 Abstract
Emotion-aware customer service needs in-domain conversational data, rich annotations, and predictive capabilities, but existing resources for emotion recognition are often out-of-domain, narrowly labeled, and focused on post-hoc detection. To address this, we conducted a controlled Wizard of Oz (WOZ) experiment to elicit interactions with targeted affective trajectories. The resulting corpus, EmoWOZ-CS, contains 2,148 bilingual (Dutch-English) written dialogues from 179 participants across commercial aviation, e-commerce, online travel agencies, and telecommunication scenarios. Our contributions are threefold: (1) Evaluate WOZ-based operator-steered valence trajectories as a design for emotion research; (2) Quantify human annotation performance and variation, including divergences between self-reports and third-party judgments; (3) Benchmark detection and forward-looking emotion inference in real-time support. Findings show neutral dominates participant messages; desire and gratitude are the most frequent non-neutral emotions. Agreement is moderate for multilabel emotions and valence, lower for arousal and dominance; self-reports diverge notably from third-party labels, aligning most for neutral, gratitude, and anger. Objective strategies often elicit neutrality or gratitude, while suboptimal strategies increase anger, annoyance, disappointment, desire, and confusion. Some affective strategies (cheerfulness, gratitude) foster positive reciprocity, whereas others (apology, empathy) can also leave desire, anger, or annoyance. Temporal analysis confirms successful conversation-level steering toward prescribed trajectories, most distinctly for negative targets; positive and neutral targets yield similar final valence distributions. Benchmarks highlight the difficulty of forward-looking emotion inference from prior turns, underscoring the complexity of proactive emotion-aware support.