Auditing Facial Emotion Recognition Datasets for Posed Expressions and Racial Bias

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies pervasive posedness and racial bias in facial expression recognition (FER) datasets: datasets advertised as “in-the-wild” contain substantial proportions of posed images, and mainstream FER models exhibit systematic misclassification—particularly mislabeling smiles as anger or sadness—on darker-skinned and non-White subjects. To address this, we propose a posed-image detection method leveraging multi-model prediction consistency and expert human annotation, integrated with skin chromaticity labeling and cross-model comparative analysis. We empirically evaluate the approach across three state-of-the-art FER models. Results reveal significant posedness rates across multiple public datasets, with model bias strongly correlated with posedness prevalence. Crucially, this work provides the first systematic empirical demonstration of the coupling mechanism between posedness and racial bias in FER. It advances fairer data curation by proposing new guidelines for authentic, representative dataset collection and introducing a fairness-aware evaluation framework grounded in posedness-aware analysis.

Technology Category

Application Category

📝 Abstract
Facial expression recognition (FER) algorithms classify facial expressions into emotions such as happy, sad, or angry. An evaluative challenge facing FER algorithms is the fall in performance when detecting spontaneous expressions compared to posed expressions. An ethical (and evaluative) challenge facing FER algorithms is that they tend to perform poorly for people of some races and skin colors. These challenges are linked to the data collection practices employed in the creation of FER datasets. In this study, we audit two state-of-the-art FER datasets. We take random samples from each dataset and examine whether images are spontaneous or posed. In doing so, we propose a methodology for identifying spontaneous or posed images. We discover a significant number of images that were posed in the datasets purporting to consist of in-the-wild images. Since performance of FER models vary between spontaneous and posed images, the performance of models trained on these datasets will not represent the true performance if such models were to be deployed in in-the-wild applications. We also observe the skin color of individuals in the samples, and test three models trained on each of the datasets to predict facial expressions of people from various races and skin tones. We find that the FER models audited were more likely to predict people labeled as not white or determined to have dark skin as showing a negative emotion such as anger or sadness even when they were smiling. This bias makes such models prone to perpetuate harm in real life applications.
Problem

Research questions and friction points this paper is trying to address.

Auditing FER datasets for posed vs. spontaneous expression bias
Evaluating racial and skin color bias in FER model predictions
Assessing dataset impact on real-world FER algorithm performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auditing FER datasets for posed expressions bias
Proposing methodology to identify spontaneous vs posed images
Testing racial bias in FER model emotion predictions
🔎 Similar Papers
No similar papers found.
R
Rina Khan
School of Computing, Queen’s University, Kingston, Ontario, Canada
Catherine Stinson
Catherine Stinson
Assistant Professor of Philosophy and Computing, Queen's University at Kingston