More Similar than Dissimilar: Modeling Annotators for Cross-Corpus Speech Emotion Recognition

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

In speech emotion recognition (SER), existing models exhibit poor generalization to novel annotators due to reliance on consensus labels or requirements for extensive re-annotation and fine-tuning. To address this, we propose a low-resource annotator-personalization framework: first, a multi-annotator pre-trained model is constructed to explicitly encode individual annotation preferences; second, cross-corpus annotator similarity is measured via behavioral patterns, enabling retrieval of the most similar source annotator for a new target annotator; finally, rapid adaptation is achieved using only 5–10 target-labeled utterances, without full-model fine-tuning—only lightweight similarity matching and linear prediction calibration are required. Evaluated on RAVDESS, CREMA-D, and other benchmarks, our method significantly outperforms off-the-shelf baselines (average +6.2% Unweighted Average Recall), establishing the first SER approach achieving cross-corpus, ultra-low-sampling (<1% new annotations), high-accuracy annotator adaptation.

Technology Category

Application Category

📝 Abstract

Speech emotion recognition systems often predict a consensus value generated from the ratings of multiple annotators. However, these models have limited ability to predict the annotation of any one person. Alternatively, models can learn to predict the annotations of all annotators. Adapting such models to new annotators is difficult as new annotators must individually provide sufficient labeled training data. We propose to leverage inter-annotator similarity by using a model pre-trained on a large annotator population to identify a similar, previously seen annotator. Given a new, previously unseen, annotator and limited enrollment data, we can make predictions for a similar annotator, enabling off-the-shelf annotation of unseen data in target datasets, providing a mechanism for extremely low-cost personalization. We demonstrate our approach significantly outperforms other off-the-shelf approaches, paving the way for lightweight emotion adaptation, practical for real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Predicting individual annotator emotions in speech recognition

Adapting models to new annotators with limited data

Leveraging inter-annotator similarity for low-cost personalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging inter-annotator similarity for adaptation

Using pre-trained model to identify similar annotators

Enabling low-cost personalization with limited enrollment data

🔎 Similar Papers

The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability