π€ AI Summary
In speech emotion recognition (SER), existing models exhibit poor generalization to novel annotators due to reliance on consensus labels or requirements for extensive re-annotation and fine-tuning. To address this, we propose a low-resource annotator-personalization framework: first, a multi-annotator pre-trained model is constructed to explicitly encode individual annotation preferences; second, cross-corpus annotator similarity is measured via behavioral patterns, enabling retrieval of the most similar source annotator for a new target annotator; finally, rapid adaptation is achieved using only 5β10 target-labeled utterances, without full-model fine-tuningβonly lightweight similarity matching and linear prediction calibration are required. Evaluated on RAVDESS, CREMA-D, and other benchmarks, our method significantly outperforms off-the-shelf baselines (average +6.2% Unweighted Average Recall), establishing the first SER approach achieving cross-corpus, ultra-low-sampling (<1% new annotations), high-accuracy annotator adaptation.
π Abstract
Speech emotion recognition systems often predict a consensus value generated from the ratings of multiple annotators. However, these models have limited ability to predict the annotation of any one person. Alternatively, models can learn to predict the annotations of all annotators. Adapting such models to new annotators is difficult as new annotators must individually provide sufficient labeled training data. We propose to leverage inter-annotator similarity by using a model pre-trained on a large annotator population to identify a similar, previously seen annotator. Given a new, previously unseen, annotator and limited enrollment data, we can make predictions for a similar annotator, enabling off-the-shelf annotation of unseen data in target datasets, providing a mechanism for extremely low-cost personalization. We demonstrate our approach significantly outperforms other off-the-shelf approaches, paving the way for lightweight emotion adaptation, practical for real-world deployment.