Auditing Facial Emotion Recognition Datasets for Posed Expressions and Racial Bias

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study identifies pervasive posedness and racial bias in facial expression recognition (FER) datasets: datasets advertised as “in-the-wild” contain substantial proportions of posed images, and mainstream FER models exhibit systematic misclassification—particularly mislabeling smiles as anger or sadness—on darker-skinned and non-White subjects. To address this, we propose a posed-image detection method leveraging multi-model prediction consistency and expert human annotation, integrated with skin chromaticity labeling and cross-model comparative analysis. We empirically evaluate the approach across three state-of-the-art FER models. Results reveal significant posedness rates across multiple public datasets, with model bias strongly correlated with posedness prevalence. Crucially, this work provides the first systematic empirical demonstration of the coupling mechanism between posedness and racial bias in FER. It advances fairer data curation by proposing new guidelines for authentic, representative dataset collection and introducing a fairness-aware evaluation framework grounded in posedness-aware analysis.

Technology Category

Application Category

📝 Abstract

Facial expression recognition (FER) algorithms classify facial expressions into emotions such as happy, sad, or angry. An evaluative challenge facing FER algorithms is the fall in performance when detecting spontaneous expressions compared to posed expressions. An ethical (and evaluative) challenge facing FER algorithms is that they tend to perform poorly for people of some races and skin colors. These challenges are linked to the data collection practices employed in the creation of FER datasets. In this study, we audit two state-of-the-art FER datasets. We take random samples from each dataset and examine whether images are spontaneous or posed. In doing so, we propose a methodology for identifying spontaneous or posed images. We discover a significant number of images that were posed in the datasets purporting to consist of in-the-wild images. Since performance of FER models vary between spontaneous and posed images, the performance of models trained on these datasets will not represent the true performance if such models were to be deployed in in-the-wild applications. We also observe the skin color of individuals in the samples, and test three models trained on each of the datasets to predict facial expressions of people from various races and skin tones. We find that the FER models audited were more likely to predict people labeled as not white or determined to have dark skin as showing a negative emotion such as anger or sadness even when they were smiling. This bias makes such models prone to perpetuate harm in real life applications.

Problem

Research questions and friction points this paper is trying to address.

Auditing FER datasets for posed vs. spontaneous expression bias

Evaluating racial and skin color bias in FER model predictions

Assessing dataset impact on real-world FER algorithm performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Auditing FER datasets for posed expressions bias

Proposing methodology to identify spontaneous vs posed images

Testing racial bias in FER model emotion predictions

🔎 Similar Papers

The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning