RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the limited interpretability of Parkinson’s disease (PD) speech classification models and their insufficient transparency for clinical decision-making, this paper proposes a cross-attention neural network that integrates interpretable acoustic features with self-supervised representations. We innovatively design an interpretable cross-attention mechanism that explicitly models associations between clinically relevant acoustic abnormalities—such as tremor and prosodic deviations—and deep latent representations. Additionally, we introduce a long-speech segmentation strategy to mitigate performance degradation on extended utterances. Evaluated on mainstream PD speech datasets, our model achieves state-of-the-art classification accuracy (mean accuracy: 94.2%) while generating stable, consistent, and clinically meaningful attribution maps. Experimental results demonstrate that high predictive accuracy and high interpretability can be simultaneously attained, establishing a trustworthy and deployable technical framework for AI-assisted, non-invasive early screening of PD.

Technology Category

Application Category

📝 Abstract

Parkinson's Disease (PD) affects over 10 million people globally, with speech impairments often preceding motor symptoms by years, making speech a valuable modality for early, non-invasive detection. While recent deep-learning models achieve high accuracy, they typically lack the explainability required for clinical use. To address this, we propose RECA-PD, a novel, robust, and explainable cross-attention architecture that combines interpretable speech features with self-supervised representations. RECA-PD matches state-of-the-art performance in Speech-based PD detection while providing explanations that are more consistent and more clinically meaningful. Additionally, we demonstrate that performance degradation in certain speech tasks (e.g., monologue) can be mitigated by segmenting long recordings. Our findings indicate that performance and explainability are not necessarily mutually exclusive. Future work will enhance the usability of explanations for non-experts and explore severity estimation to increase the real-world clinical relevance.

Problem

Research questions and friction points this paper is trying to address.

Develops explainable AI for early Parkinson's detection via speech

Combines interpretable features with self-supervised learning for clinical use

Addresses performance degradation in long speech recordings via segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-attention architecture combining interpretable and self-supervised features

Segmentation of long recordings to mitigate performance degradation

Balanced high accuracy with clinically meaningful explainability

🔎 Similar Papers

No similar papers found.