🤖 AI Summary
This study addresses the ambiguity in the separability of pathological speech and accent variation within representation spaces for multilingual voice-based health assessment, a challenge that can lead to misdiagnosis or missed diagnosis. The work proposes the first systematic framework to quantify the geometric disentanglement of affective, linguistic, and pathological attributes in embedding spaces across six corpora, leveraging four clustering metrics—including the Silhouette coefficient—permutation tests, and confidence analyses. Results reveal that affective features form the most compact clusters (0.250), followed by pathological (0.141) and linguistic features (0.077). Critically, the entanglement between pathology and language remains below 0.21, satisfying the fairness threshold required for clinical deployment. This research establishes actionable criteria for representation disentanglement and provides fairness guarantees for cross-lingual speech health systems.
📝 Abstract
Speech-based clinical tools are increasingly deployed in multilingual settings, yet whether pathological speech markers remain geometrically separable from accent variation remains unclear. Systems may misclassify healthy non-native speakers or miss pathology in multilingual patients. We propose a four-metric clustering framework to evaluate geometric disentanglement of emotional, linguistic, and pathological speech features across six corpora and eight dataset combinations. A consistent hierarchy emerges: emotional features form the tightest clusters (Silhouette 0.250), followed by pathological (0.141) and linguistic (0.077). Confound analysis shows pathological-linguistic overlap remains below 0.21, which is above the permutation null but bounded for clinical deployment. Trustworthiness analysis confirms embedding fidelity and robustness of the geometric conclusions. Our framework provides actionable guidelines for equitable and reliable speech health systems across diverse populations.