🤖 AI Summary
Current dialogue evaluation predominantly relies on single-turn response quality, lacking robust automated methods for comparing holistic interaction dynamics—i.e., dialogue “morphology”—thereby hindering systematic assessment of dialogue agents. To address this, we propose the first robust dialogue dynamic similarity metric, integrating sequential modeling with dynamic time warping to extract temporal interaction features and construct a behavior-pattern-based structural similarity model. Our method explicitly encodes contextual factors—including topic evolution and power asymmetry—and employs an interpretable validation framework to assess sensitivity and stability. Empirical evaluation on large-scale online community data demonstrates that the metric effectively captures systematic structural impacts of power differentials on dialogue flow, significantly enhancing both the automation capability and interpretability of complex dialogue behavior analysis.
📝 Abstract
The quality of a conversation goes beyond the individual quality of each reply, and instead emerges from how these combine into interactional patterns that give the conversation its distinctive overall "shape". However, there is no robust automated method for comparing conversations in terms of their overall interactional dynamics. Such methods could enhance the analysis of conversational data and help evaluate conversational agents more holistically.
In this work, we introduce a similarity measure for comparing conversations with respect to their dynamics. We design a validation framework for testing the robustness of the metric in capturing differences in conversation dynamics and for assessing its sensitivity to the topic of the conversations. Finally, to illustrate the measure's utility, we use it to analyze conversational dynamics in a large online community, bringing new insights into the role of situational power in conversations.