🤖 AI Summary
This study addresses the frequent interaction failures of conversational AI in emotionally charged and ethically sensitive scenarios. The authors propose a role-conditioned user simulator that integrates psychological personality traits and emotional dynamics to conduct multi-turn stress tests on mainstream chatbots. Combining qualitative and quantitative analyses, they systematically identify and categorize key failure modes—including affective misalignment, ineffective ethical guidance, and problematic empathy–responsibility trade-offs—revealing recurrent dialogue breakdowns as emotional intensity escalates. The work establishes a diagnostic framework tailored to value-sensitive conversational contexts, offering both theoretical grounding and practical design guidance to enhance the ethical consistency and emotional sensitivity of dialogue systems.
📝 Abstract
Conversational AI is increasingly deployed in emotionally charged and ethically sensitive interactions. Previous research has primarily concentrated on emotional benchmarks or static safety checks, overlooking how alignment unfolds in evolving conversation. We explore the research question: what breakdowns arise when conversational agents confront emotionally and ethically sensitive behaviors, and how do these affect dialogue quality? To stress-test chatbot performance, we develop a persona-conditioned user simulator capable of engaging in multi-turn dialogue with psychological personas and staged emotional pacing. Our analysis reveals that mainstream models exhibit recurrent breakdowns that intensify as emotional trajectories escalate. We identify several common failure patterns, including affective misalignments, ethical guidance failures, and cross-dimensional trade-offs where empathy supersedes or undermines responsibility. We organize these patterns into a taxonomy and discuss the design implications, highlighting the necessity to maintain ethical coherence and affective sensitivity throughout dynamic interactions. The study offers the HCI community a new perspective on the diagnosis and improvement of conversational AI in value-sensitive and emotionally charged contexts.