🤖 AI Summary
This paper addresses the lack of an interdisciplinary-compatible annotation framework for conversational turn-taking. We propose a two-layer temporal alignment framework—comprising Inter-Pausal Units (IPUs) and Prosodic Compositional Units (PCOMPs)—that jointly satisfies sequential requirements from conversation analysis, temporal precision for speech modeling, and computational tractability for machine learning. Applied to 95 minutes of naturalistic dialogue from the GRASS corpus, the framework employs Praat for millisecond-level temporal alignment, integrates hierarchical manual annotation with sequence-based rules, and achieves high inter-annotator agreement (Cohen’s κ > 0.95 for IPUs; κ ≈ 0.75 for PCOMPs). To our knowledge, this is the first systematic integration of theoretical rigor, phonetic-analytic fidelity, and automatic classification feasibility into a standardized, highly consistent, reusable, and extensible protocol. Accompanying open-source annotated data and a detailed annotation guide further support cross-disciplinary research and applications at the intersection of linguistics and artificial intelligence.
📝 Abstract
This paper has two goals. First, we present the turn-taking annotation layers created for 95 minutes of conversational speech of the Graz Corpus of Read and Spontaneous Speech (GRASS), available to the scientific community. Second, we describe the annotation system and the annotation process in more detail, so other researchers may use it for their own conversational data. The annotation system was developed with an interdisciplinary application in mind. It should be based on sequential criteria according to Conversation Analysis, suitable for subsequent phonetic analysis, thus time-aligned annotations were made Praat, and it should be suitable for automatic classification, which required the continuous annotation of speech and a label inventory that is not too large and results in a high inter-rater agreement. Turn-taking was annotated on two layers, Inter-Pausal Units (IPU) and points of potential completion (PCOMP; similar to transition relevance places). We provide a detailed description of the annotation process and of segmentation and labelling criteria. A detailed analysis of inter-rater agreement and common confusions shows that agreement for IPU annotation is near-perfect, that agreement for PCOMP annotations is substantial, and that disagreements often are either partial or can be explained by a different analysis of a sequence which also has merit. The annotation system can be applied to a variety of conversational data for linguistic studies and technological applications, and we hope that the annotations, as well as the annotation system will contribute to a stronger cross-fertilization between these disciplines.