GeHirNet: A Gender-Aware Hierarchical Model for Voice Pathology Classification

๐Ÿ“… 2025-08-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Speech pathology classification faces two key challenges: gender-related acoustic bias and severe class imbalance due to scarcity of rare disorder samples. To address these, we propose a gender-aware hierarchical modeling framework comprising two stages: (1) accurate speaker gender identification and extraction of gender-specific acoustic features; and (2) gender-conditioned disease classification. We further introduce novel multi-scale resampling and time-warping data augmentation strategies to mitigate both bias and imbalance. Our model employs ResNet-50 for Mel-spectrogram analysis and is trained on a unified corpus comprising four public datasets. It achieves 97.63% accuracy and 95.25% Matthews Correlation Coefficient (MCC), outperforming the single-stage baseline by 5.0 percentage pointsโ€”setting a new state-of-the-art. This advancement significantly enhances the clinical viability of AI-driven speech pathology diagnosis.

Technology Category

Application Category

๐Ÿ“ Abstract
AI-based voice analysis shows promise for disease diagnostics, but existing classifiers often fail to accurately identify specific pathologies because of gender-related acoustic variations and the scarcity of data for rare diseases. We propose a novel two-stage framework that first identifies gender-specific pathological patterns using ResNet-50 on Mel spectrograms, then performs gender-conditioned disease classification. We address class imbalance through multi-scale resampling and time warping augmentation. Evaluated on a merged dataset from four public repositories, our two-stage architecture with time warping achieves state-of-the-art performance (97.63% accuracy, 95.25% MCC), with a 5% MCC improvement over single-stage baseline. This work advances voice pathology classification while reducing gender bias through hierarchical modeling of vocal characteristics.
Problem

Research questions and friction points this paper is trying to address.

Classifying voice pathologies despite gender-related acoustic variations
Addressing data scarcity for rare diseases in voice pathology detection
Reducing gender bias in AI-based voice pathology classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage gender-aware hierarchical classification model
ResNet-50 on Mel spectrograms for pattern detection
Multi-scale resampling and time warping augmentation
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Fan Wu
Centre for Digital Health Interventions, ETH Zurich, Zurich, Switzerland
K
Kaicheng Zhao
Institute of Mechanism Theory, Machine Dynamics and Robotics, RWTH Aachen University, Aachen, Germany
Elgar Fleisch
Elgar Fleisch
Professor for Information and Technology Management
Internet of ThingsInformation ManagementTechnology Management
Filipe Barata
Filipe Barata
ETH Zurich - Centre for Digital Health Interventions
Digital BiomarkersMachine LearningDigital HealthUbiquitous ComputingArtificial Intelligence