🤖 AI Summary
This study addresses the challenge of automatic turn segmentation in spontaneous speech transcripts. Unlike conventional coarse-grained approaches relying on silence duration or speaker change, we propose a fine-grained turn segmentation method that explicitly distinguishes primary speaker turns from secondary listener responses—including backchannels, interjections, and overlapping speech—to capture the dynamic, interactive structure of natural conversation. Our approach integrates discourse function identification with contextual temporal modeling, employing a rule-augmented statistical framework trained and optimized on large-scale conversational corpora. Experimental results demonstrate significant improvements in turn boundary accuracy on real-world data. The resulting transcripts exhibit enhanced statistical robustness and improved capacity for inferring social interaction patterns. This work provides a more reliable foundational tool for computational dialogue analysis and empirical social science research.
📝 Abstract
Conversation is the subject of increasing interest in the social, cognitive, and computational sciences. And yet, as conversational datasets continue to increase in size and complexity, researchers lack scalable methods to segment speech-to-text transcripts into conversational turns-the basic building blocks of social interaction. We discuss this challenge and then introduce"NaturalTurn,"a turn segmentation algorithm designed to accurately capture the dynamics of naturalistic exchange. NaturalTurn operates by distinguishing speakers' primary conversational turns from listeners' secondary utterances, such as backchannels, brief interjections, and other forms of parallel speech that characterize conversation. Using data from a large conversation corpus, we show how NaturalTurn-derived transcripts demonstrate favorable statistical and inferential characteristics compared to transcripts derived from existing methods. The NaturalTurn algorithm represents an improvement in machine-generated transcript processing methods, or"turn models"that will enable researchers to link turn-taking dynamics with the broader outcomes that result from social interaction, a central goal of conversation science.