More Similar than Dissimilar: Modeling Annotators for Cross-Corpus Speech Emotion Recognition

πŸ“… 2025-09-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In speech emotion recognition (SER), existing models exhibit poor generalization to novel annotators due to reliance on consensus labels or requirements for extensive re-annotation and fine-tuning. To address this, we propose a low-resource annotator-personalization framework: first, a multi-annotator pre-trained model is constructed to explicitly encode individual annotation preferences; second, cross-corpus annotator similarity is measured via behavioral patterns, enabling retrieval of the most similar source annotator for a new target annotator; finally, rapid adaptation is achieved using only 5–10 target-labeled utterances, without full-model fine-tuningβ€”only lightweight similarity matching and linear prediction calibration are required. Evaluated on RAVDESS, CREMA-D, and other benchmarks, our method significantly outperforms off-the-shelf baselines (average +6.2% Unweighted Average Recall), establishing the first SER approach achieving cross-corpus, ultra-low-sampling (<1% new annotations), high-accuracy annotator adaptation.

Technology Category

Application Category

πŸ“ Abstract
Speech emotion recognition systems often predict a consensus value generated from the ratings of multiple annotators. However, these models have limited ability to predict the annotation of any one person. Alternatively, models can learn to predict the annotations of all annotators. Adapting such models to new annotators is difficult as new annotators must individually provide sufficient labeled training data. We propose to leverage inter-annotator similarity by using a model pre-trained on a large annotator population to identify a similar, previously seen annotator. Given a new, previously unseen, annotator and limited enrollment data, we can make predictions for a similar annotator, enabling off-the-shelf annotation of unseen data in target datasets, providing a mechanism for extremely low-cost personalization. We demonstrate our approach significantly outperforms other off-the-shelf approaches, paving the way for lightweight emotion adaptation, practical for real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Predicting individual annotator emotions in speech recognition
Adapting models to new annotators with limited data
Leveraging inter-annotator similarity for low-cost personalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging inter-annotator similarity for adaptation
Using pre-trained model to identify similar annotators
Enabling low-cost personalization with limited enrollment data
πŸ”Ž Similar Papers
No similar papers found.
J
James Tavernor
Computer Science and Engineering, University of Michigan, Ann Arbor, United States
Emily Mower Provost
Emily Mower Provost
Professor of Computer Science, University of Michigan
Emotion RecognitionMachine LearningEmotion Perception