๐ค AI Summary
Speech pathology classification faces two key challenges: gender-related acoustic bias and severe class imbalance due to scarcity of rare disorder samples. To address these, we propose a gender-aware hierarchical modeling framework comprising two stages: (1) accurate speaker gender identification and extraction of gender-specific acoustic features; and (2) gender-conditioned disease classification. We further introduce novel multi-scale resampling and time-warping data augmentation strategies to mitigate both bias and imbalance. Our model employs ResNet-50 for Mel-spectrogram analysis and is trained on a unified corpus comprising four public datasets. It achieves 97.63% accuracy and 95.25% Matthews Correlation Coefficient (MCC), outperforming the single-stage baseline by 5.0 percentage pointsโsetting a new state-of-the-art. This advancement significantly enhances the clinical viability of AI-driven speech pathology diagnosis.
๐ Abstract
AI-based voice analysis shows promise for disease diagnostics, but existing classifiers often fail to accurately identify specific pathologies because of gender-related acoustic variations and the scarcity of data for rare diseases. We propose a novel two-stage framework that first identifies gender-specific pathological patterns using ResNet-50 on Mel spectrograms, then performs gender-conditioned disease classification. We address class imbalance through multi-scale resampling and time warping augmentation. Evaluated on a merged dataset from four public repositories, our two-stage architecture with time warping achieves state-of-the-art performance (97.63% accuracy, 95.25% MCC), with a 5% MCC improvement over single-stage baseline. This work advances voice pathology classification while reducing gender bias through hierarchical modeling of vocal characteristics.