🤖 AI Summary
This study addresses the lack of a standardized, reproducible evaluation benchmark in existing research on voice-based early Parkinson’s disease detection, which has hindered meaningful comparison of results. To bridge this gap, the authors introduce the first publicly available, speaker-independent speech benchmark encompassing three representative speech tasks. Systematic evaluations are conducted across multiple training data scales, enabling fine-grained analysis by dataset, aggregation level, gender, and disease stage. The benchmark provides reproducible performance references and practical insights, significantly advancing the clinical translation of voice-driven approaches for early Parkinson’s detection.
📝 Abstract
Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and EarlyPD definitions. To address this issue, we propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for fair and replicable cross-method evaluation on researcher-accessible datasets. The benchmark covers three common speech tasks and evaluates methods under different training-resource settings. We also present multi-dimensional evaluation breakdowns by dataset, aggregation level, gender, and disease stage to support fine-grained comparisons and clinical adoption. Our results provide a replicable reference and actionable insights, encouraging the adoption of this publicly available benchmark to advance robust and clinically meaningful EarlyPD detection from speech.