Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Speaker-dependent models for voice-based health monitoring face a trade-off between accuracy and deployment efficiency, hindering real-time, scalable applications. Method: This paper pioneers the application of meta-learning to voice health monitoring, proposing a novel paradigm for individualized, dynamic modeling of vocal fatigue—specifically, predicting time elapsed since last sleep—from speech. We design three progressive meta-learning architectures—integrated distance-based, prototypical, and Transformer-based sequence models—leveraging pretrained speech embeddings and enabling few-shot adaptation on longitudinal real-world data. Contribution/Results: Evaluated on a large-scale dataset comprising over 10,000 utterances from 1,185 shift workers, all meta-learning approaches significantly outperform conventional cross-sectional models and traditional mixed-effects models; the Transformer-based method achieves state-of-the-art performance. Crucially, our framework eliminates the need for per-subject retraining inherent in mixed-effects modeling, delivering a scalable, low-latency paradigm for personalized voice-based health monitoring.

Technology Category

Application Category

📝 Abstract

Speaker-dependent modelling can substantially improve performance in speech-based health monitoring applications. While mixed-effect models are commonly used for such speaker adaptation, they require computationally expensive retraining for each new observation, making them impractical in a production environment. We reformulate this task as a meta-learning problem and explore three approaches of increasing complexity: ensemble-based distance models, prototypical networks, and transformer-based sequence models. Using pre-trained speech embeddings, we evaluate these methods on a large longitudinal dataset of shift workers (N=1,185, 10,286 recordings), predicting time since sleep from speech as a function of fatigue, a symptom commonly associated with ill-health. Our results demonstrate that all meta-learning approaches tested outperformed both cross-sectional and conventional mixed-effects models, with a transformer-based method achieving the strongest performance.

Problem

Research questions and friction points this paper is trying to address.

Improving speaker-dependent voice fatigue modeling accuracy

Reducing computational cost in speech-based health monitoring

Evaluating meta-learning for fatigue prediction from speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulate speaker adaptation as meta-learning problem

Use pre-trained speech embeddings for evaluation

Transformer-based method achieves strongest performance

🔎 Similar Papers

No similar papers found.