Faces of Fairness: Examining Bias in Facial Expression Recognition Datasets and Models

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This study systematically investigates sources of bias and fairness challenges in facial expression recognition (FER). Addressing the problem of demographic disparities in FER performance, we conduct a comprehensive empirical analysis across four benchmark datasets—AffectNet, ExpW, Fer2013, and RAF-DB—and six representative models: MobileNet, ResNet, Xception, ViT, CLIP, and GPT-4o-mini. Our method employs fine-grained demographic annotations, cross-dataset generalization evaluation, and quantitative fairness metrics—including Equalized Odds and Demographic Parity. Contrary to common assumptions, we reveal for the first time that high-accuracy Transformer-based models (ViT and GPT-4o-mini) exhibit significantly greater group-level bias than lightweight CNNs. We further demonstrate that data imbalance and model architecture jointly exacerbate fairness degradation, uncovering a strong accuracy–fairness trade-off. To support reproducibility and future research, we release an open-source, fully reproducible experimental framework—providing both theoretical insights and practical guidelines for developing fairer FER systems.

Technology Category

Application Category

📝 Abstract

Building AI systems, including Facial Expression Recognition (FER), involves two critical aspects: data and model design. Both components significantly influence bias and fairness in FER tasks. Issues related to bias and fairness in FER datasets and models remain underexplored. This study investigates bias sources in FER datasets and models. Four common FER datasets--AffectNet, ExpW, Fer2013, and RAF-DB--are analyzed. The findings demonstrate that AffectNet and ExpW exhibit high generalizability despite data imbalances. Additionally, this research evaluates the bias and fairness of six deep models, including three state-of-the-art convolutional neural network (CNN) models: MobileNet, ResNet, XceptionNet, as well as three transformer-based models: ViT, CLIP, and GPT-4o-mini. Experimental results reveal that while GPT-4o-mini and ViT achieve the highest accuracy scores, they also display the highest levels of bias. These findings underscore the urgent need for developing new methodologies to mitigate bias and ensure fairness in datasets and models, particularly in affective computing applications. See our implementation details at https://github.com/MMHosseini/bias_in_FER.

Problem

Research questions and friction points this paper is trying to address.

Investigating bias in facial expression recognition datasets.

Evaluating fairness in deep learning models for expression recognition.

Developing methods to reduce bias in affective computing.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed four FER datasets for biases

Evaluated six deep models including CNNs

Highlighted bias issues in high-accuracy models

🔎 Similar Papers

No similar papers found.