๐ค AI Summary
This study identifies a significant domain shift in speaker diarization (SD) for clinical dialogues featuring African-accented English: short utterances and high speech overlap lead to sharply increased false alarms and missed detections, severely degrading general-purpose model performance. To address this, we introduce the first cross-domain (general-to-clinical) controllable evaluation benchmark and propose a rigorous Diarization Error Rate (DER) protocol incorporating overlap-aware scoring. We further develop a conversation-level error decomposition framework and speaker profiling methodology to quantify sources of domain bias. Finally, we design a lightweight, reproducible domain adaptation approachโfine-tuning the segmentation module using accent-matched data. Experiments demonstrate substantial error reduction in clinical settings; however, residual performance gaps underscore the necessity of overlap-aware segmentation and balanced, accent-diverse training data curation.
๐ Abstract
This study examines domain effects in speaker diarization for African-accented English. We evaluate multiple production and open systems on general and clinical dialogues under a strict DER protocol that scores overlap. A consistent domain penalty appears for clinical speech and remains significant across models. Error analysis attributes much of this penalty to false alarms and missed detections, aligning with short turns and frequent overlap. We test lightweight domain adaptation by fine-tuning a segmentation module on accent-matched data; it reduces error but does not eliminate the gap. Our contributions include a controlled benchmark across domains, a concise approach to error decomposition and conversation-level profiling, and an adaptation recipe that is easy to reproduce. Results point to overlap-aware segmentation and balanced clinical resources as practical next steps.