Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing speech-based Parkinson’s disease (PD) detection methods predominantly rely on holistic utterance-level modeling, neglecting the diagnostic potential of fine-grained phonetic units—such as phonemes, syllables, and words. To address this, we propose the first multilingual (English, Italian, Spanish) multi-granularity speech diagnosis framework. Our method introduces an end-to-end phoneme-level temporal alignment pipeline that integrates bidirectional LSTMs with multi-head self-attention for cross-lingual, interpretable classification. Crucially, we systematically demonstrate—for the first time—that phoneme-level features achieve superior discriminative power across languages. Moreover, attention weights consistently highlight phonemic segments clinically recognized as PD-relevant, conferring strong clinical interpretability. Evaluated on a multilingual PD dataset, our framework achieves 93.78% AUROC and 92.17% accuracy, validating both the efficacy and cross-lingual generalizability of phoneme-level modeling.

Technology Category

Application Category

📝 Abstract
Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing Parkinson's Disease from speech across multiple languages
Overcoming limitations of utterance-level analysis in speech impairments
Identifying most informative phonetic elements for clinical alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity analysis of phonemes syllables and words
Bidirectional LSTM with attention for cross-lingual diagnosis
Time-aligned phonetic features matching clinical assessment protocols
🔎 Similar Papers
No similar papers found.
I
Ilias Tougui
International University of Rabat
M
Mehdi Zakroum
International University of Rabat
Mounir Ghogho
Mounir Ghogho
University Mohammed VI Polytechnic
Machine LearningSignal ProcessingWireless Communication