π€ AI Summary
Speech pathology severity assessment has long relied on subjective, inefficient expert annotations. Existing automated approaches are limited by their dependence on healthy speech/text references or susceptibility to spurious correlations. This paper introduces XPPG-PCA: the first reference-free, unsupervised, general-purpose assessment framework. It extracts robust speaker representations via x-vectors, constructs phoneme posterior graphs (PPGs), and automatically identifies low-dimensional principal components strongly correlated with pathology severity using PCA. By eliminating reliance on data shortcuts and mitigating noise interference, XPPG-PCA achieves performance on par with or surpassing supervised and reference-based baselines across three Dutch oral cancer datasets. It demonstrates strong robustness and cross-task generalization capability. The code and models are publicly released.
π Abstract
Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.