🤖 AI Summary
This work addresses the limitation of traditional facial emotion recognition approaches that rely on single-label classification, which fails to capture the nuanced and often blended nature of emotions in real-world scenarios. To better model the complexity and perceptual ambiguity of human affect, the authors propose formulating emotion recognition as a probability distribution learning task. By leveraging the Valence-Arousal-Dominance (VAD) affective space, the method automatically re-annotates single-label facial images into probabilistic mixtures of both basic and compound emotions. This approach enables richer emotional representations and more psychologically plausible predictions. The framework is validated through preliminary experiments on existing datasets, demonstrating its effectiveness in enhancing emotion representation fidelity and recognition plausibility. The re-annotated data have been publicly released, offering a new paradigm for affective computing research.
📝 Abstract
Facial emotion recognition has been typically cast as a single-label classification problem of one out of six prototypical emotions. However, that is an oversimplification that is unsuitable for representing the multifaceted spectrum of spontaneous emotional states, which are most often the result of a combination of multiple emotions contributing at different intensities. Building on this, a promising direction that was explored recently is to cast emotion recognition as a distribution learning problem. Still, such approaches are limited in that research datasets are typically annotated with a single emotion class. In this paper, we contribute a novel approach to describe complex emotional states as probability distributions over a set of emotion classes. To do so, we propose a solution to automatically re-label existing datasets by exploiting the result of a study in which a large set of both basic and compound emotions is mapped to probability distributions in the Valence-Arousal-Dominance (VAD) space. In this way, given a face image annotated with VAD values, we can estimate the likelihood of it belonging to each of the distributions, so that emotional states can be described as a mixture of emotions, enriching their description, while also accounting for the ambiguous nature of their perception. In a preliminary set of experiments, we illustrate the advantages of this solution and a new possible direction of investigation. Data annotations are available at https://github.com/jbcnrlz/affectnet-b-annotation.