🤖 AI Summary
This work addresses the challenge of incomparable automatic intelligibility assessment methods for pathological speech, which stems from reliance on private datasets and inconsistent evaluation protocols. To this end, we propose PathBench—the first unified, publicly available benchmark for this task—integrating six public datasets and introducing three standardized evaluation protocols: Matched Content, Extended, and Full. We systematically compare no-reference, text-referenced, and audio-referenced approaches within a novel multi-protocol framework that synergistically combines linguistic and machine learning perspectives. Furthermore, we introduce DArtP (Dual-ASR Articulatory Precision), a new no-reference metric that achieves the highest average correlation among no-reference methods. Experimental results demonstrate that DArtP significantly improves evaluation consistency and comparability, establishing a reproducible baseline for future research in pathological speech intelligibility assessment.
📝 Abstract
Automatic speech intelligibility assessment is crucial for monitoring speech disorders and therapy efficacy. However, existing methods are difficult to compare: research is fragmented across private datasets with inconsistent protocols. We introduce PathBench, a unified benchmark for pathological speech assessment using public datasets. We compare reference-free, reference-text, and reference-audio methods across three protocols (Matched Content, Extended, and Full) representing how a linguist (controlled stimuli) versus machine learning specialist (maximum data) would approach the same data. We establish benchmark baselines across six datasets, enabling systematic evaluation of future methodological advances, and introduce Dual-ASR Articulatory Precision (DArtP), achieving the highest average correlation among reference-free methods.