Variational Autoencoder for Personalized Pathological Speech Enhancement

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing speech enhancement models exhibit severely limited cross-speaker generalization on pathological speech, such as that of Parkinson’s disease patients. This paper proposes a few-shot personalized enhancement method: leveraging a pretrained VAE-NMF hybrid model, it requires only a few seconds of clean speaker-specific speech for fine-tuning—marking the first application of personalized fine-tuning to pathological speech enhancement. The approach significantly narrows the performance gap between pathological and neurotypical speakers, achieving a 4.2 dB improvement in signal-to-noise ratio (SNR) gain, while outperforming both generic and conventionally fine-tuned models on both speaker groups. Its core contribution lies in establishing a lightweight, rapidly adaptable personalized enhancement framework that effectively addresses the generalization bottleneck arising from high modeling bias and severe data scarcity in pathological speech.

Technology Category

Application Category

📝 Abstract

The generalizability of speech enhancement (SE) models across speaker conditions remains largely unexplored, despite its critical importance for broader applicability. This paper investigates the performance of the hybrid variational autoencoder (VAE)-non-negative matrix factorization (NMF) model for SE, focusing primarily on its generalizability to pathological speakers with Parkinson's disease. We show that VAE models trained on large neurotypical datasets perform poorly on pathological speech. While fine-tuning these pre-trained models with pathological speech improves performance, a performance gap remains between neurotypical and pathological speakers. To address this gap, we propose using personalized SE models derived from fine-tuning pre-trained models with only a few seconds of clean data from each speaker. Our results demonstrate that personalized models considerably enhance performance for all speakers, achieving comparable results for both neurotypical and pathological speakers.

Problem

Research questions and friction points this paper is trying to address.

Improves speech enhancement for pathological speakers

Addresses generalizability gap in VAE-NMF models

Proposes personalized models using minimal clean data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid VAE-NMF model for speech enhancement

Personalized models using few clean data seconds

Fine-tuning pre-trained models for pathological speech

🔎 Similar Papers

No similar papers found.