🤖 AI Summary
Existing speech-based Parkinson’s disease (PD) detection methods predominantly rely on holistic utterance-level modeling, neglecting the diagnostic potential of fine-grained phonetic units—such as phonemes, syllables, and words. To address this, we propose the first multilingual (English, Italian, Spanish) multi-granularity speech diagnosis framework. Our method introduces an end-to-end phoneme-level temporal alignment pipeline that integrates bidirectional LSTMs with multi-head self-attention for cross-lingual, interpretable classification. Crucially, we systematically demonstrate—for the first time—that phoneme-level features achieve superior discriminative power across languages. Moreover, attention weights consistently highlight phonemic segments clinically recognized as PD-relevant, conferring strong clinical interpretability. Evaluated on a multilingual PD dataset, our framework achieves 93.78% AUROC and 92.17% accuracy, validating both the efficacy and cross-lingual generalizability of phoneme-level modeling.
📝 Abstract
Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.