🤖 AI Summary
Prior research predominantly focuses on toxic or polarized dialogue, leaving quantitative assessment of prosocial, constructive dialogue quality underexplored. Method: This paper proposes “responsiveness”—defined as whether an utterance substantively addresses the preceding turn—as a core dimension for computable dialogue quality evaluation. We integrate semantic similarity metrics with large language models (LLMs) to identify responsiveness relations, refine the model via human annotation, and distinguish substantive from non-substantive responses, thereby establishing a multi-level dialogue structure analysis framework. Results: Experiments validate LLMs’ effectiveness in responsiveness detection, and our metric demonstrates strong cross-scenario generalizability. Contribution: This work provides the first systematic definition and quantification of responsiveness as a key dialogue quality indicator, introducing a reproducible, scalable, and theoretically grounded evaluation paradigm for constructive dialogue.
📝 Abstract
Growing literature explores toxicity and polarization in discourse, with comparatively less work on characterizing what makes dialogue prosocial and constructive. We explore conversational discourse and investigate a method for characterizing its quality built upon the notion of ``responsivity'' -- whether one person's conversational turn is responding to a preceding turn. We develop and evaluate methods for quantifying responsivity -- first through semantic similarity of speaker turns, and second by leveraging state-of-the-art large language models (LLMs) to identify the relation between two speaker turns. We evaluate both methods against a ground truth set of human-annotated conversations. Furthermore, selecting the better performing LLM-based approach, we characterize the nature of the response -- whether it responded to that preceding turn in a substantive way or not.
We view these responsivity links as a fundamental aspect of dialogue but note that conversations can exhibit significantly different responsivity structures. Accordingly, we then develop conversation-level derived metrics to address various aspects of conversational discourse. We use these derived metrics to explore other conversations and show that they support meaningful characterizations and differentiations across a diverse collection of conversations.