Meta-Learning Approaches for Speaker-Dependent Voice Fatigue Models

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speaker-dependent models for voice-based health monitoring face a trade-off between accuracy and deployment efficiency, hindering real-time, scalable applications. Method: This paper pioneers the application of meta-learning to voice health monitoring, proposing a novel paradigm for individualized, dynamic modeling of vocal fatigue—specifically, predicting time elapsed since last sleep—from speech. We design three progressive meta-learning architectures—integrated distance-based, prototypical, and Transformer-based sequence models—leveraging pretrained speech embeddings and enabling few-shot adaptation on longitudinal real-world data. Contribution/Results: Evaluated on a large-scale dataset comprising over 10,000 utterances from 1,185 shift workers, all meta-learning approaches significantly outperform conventional cross-sectional models and traditional mixed-effects models; the Transformer-based method achieves state-of-the-art performance. Crucially, our framework eliminates the need for per-subject retraining inherent in mixed-effects modeling, delivering a scalable, low-latency paradigm for personalized voice-based health monitoring.

Technology Category

Application Category

📝 Abstract
Speaker-dependent modelling can substantially improve performance in speech-based health monitoring applications. While mixed-effect models are commonly used for such speaker adaptation, they require computationally expensive retraining for each new observation, making them impractical in a production environment. We reformulate this task as a meta-learning problem and explore three approaches of increasing complexity: ensemble-based distance models, prototypical networks, and transformer-based sequence models. Using pre-trained speech embeddings, we evaluate these methods on a large longitudinal dataset of shift workers (N=1,185, 10,286 recordings), predicting time since sleep from speech as a function of fatigue, a symptom commonly associated with ill-health. Our results demonstrate that all meta-learning approaches tested outperformed both cross-sectional and conventional mixed-effects models, with a transformer-based method achieving the strongest performance.
Problem

Research questions and friction points this paper is trying to address.

Improving speaker-dependent voice fatigue modeling accuracy
Reducing computational cost in speech-based health monitoring
Evaluating meta-learning for fatigue prediction from speech
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulate speaker adaptation as meta-learning problem
Use pre-trained speech embeddings for evaluation
Transformer-based method achieves strongest performance
🔎 Similar Papers
No similar papers found.
R
Roseline Polle
thymia, London, UK
Agnes Norbury
Agnes Norbury
Thymia Limited
cognitive neurosciencecomputational psychiatrydigital mental health
A
A. Georgescu
thymia, London, UK; Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
Nicholas Cummins
Nicholas Cummins
King's College London
ParalinguisticsSpeech ProcessingMental HealthMachine Learning
S
S. Goria
thymia, London, UK