Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study systematically evaluates the effectiveness and generalizability of three pretrained audio embeddings—OpenL3, VGGish, and Wav2Vec 2.0—for binary Parkinson’s disease (PD) classification from speech, addressing critical gaps in model fairness and robustness across speaker demographics and task modalities. Method: We conduct multi-task analysis on the NeuroVoz dataset—including diadochokinetic (DDK) and reading tasks—and perform end-to-end evaluation by combining each embedding with traditional classifiers (SVM, Random Forest). Contribution/Results: OpenL3 achieves the highest accuracy and strongest cross-task generalizability on both DDK and reading tasks. Wav2Vec 2.0 exhibits notably lower overall performance but demonstrates significant male bias, revealing previously unreported gender disparity. All models show limited robustness to atypical PD speech patterns. This work provides empirical evidence and methodological guidance for selecting clinically viable speech biomarkers while highlighting the necessity of fairness-aware evaluation in PD diagnostics.

Technology Category

Application Category

📝 Abstract

Speech impairments are prevalent biomarkers for Parkinson's Disease (PD), motivating the development of diagnostic techniques using speech data for clinical applications. Although deep acoustic features have shown promise for PD classification, their effectiveness often varies due to individual speaker differences, a factor that has not been thoroughly explored in the existing literature. This study investigates the effectiveness of three pre-trained audio embeddings (OpenL3, VGGish and Wav2Vec2.0 models) for PD classification. Using the NeuroVoz dataset, OpenL3 outperforms others in diadochokinesis (DDK) and listen and repeat (LR) tasks, capturing critical acoustic features for PD detection. Only Wav2Vec2.0 shows significant gender bias, achieving more favorable results for male speakers, in DDK tasks. The misclassified cases reveal challenges with atypical speech patterns, highlighting the need for improved feature extraction and model robustness in PD detection.

Problem

Research questions and friction points this paper is trying to address.

Evaluating pre-trained audio embeddings for Parkinson's Disease speech classification

Assessing gender bias in PD classification models like Wav2Vec2.0

Addressing challenges with atypical speech patterns in PD detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained audio embeddings for PD classification

OpenL3 excels in DDK and LR tasks

Wav2Vec2.0 shows gender bias in DDK

🔎 Similar Papers

No similar papers found.