A Feature-level Bias Evaluation Framework for Facial Expression Recognition Models

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing facial expression recognition (FER) models exhibit demographic bias, yet public datasets lack ground-truth demographic annotations, and most fairness evaluations omit statistical significance testing—undermining assessment reliability. This paper introduces the first label-free, feature-level bias evaluation framework for FER. It employs feature-space disentanglement analysis coupled with a plug-and-play permutation test module to quantify fairness verifiably across age, gender, and race dimensions. The method enables multi-attribute, cross-architecture benchmarking and systematically uncovers statistically significant demographic bias in mainstream FER models on large-scale datasets, providing empirical guidance for architecture selection. Key contributions are: (1) eliminating reliance on manually annotated demographic labels; (2) embedding statistical significance testing directly into the evaluation pipeline; and (3) establishing a reproducible, scalable paradigm for FER fairness analysis.

Technology Category

Application Category

📝 Abstract
Recent studies on fairness have shown that Facial Expression Recognition (FER) models exhibit biases toward certain visually perceived demographic groups. However, the limited availability of human-annotated demographic labels in public FER datasets has constrained the scope of such bias analysis. To overcome this limitation, some prior works have resorted to pseudo-demographic labels, which may distort bias evaluation results. Alternatively, in this paper, we propose a feature-level bias evaluation framework for evaluating demographic biases in FER models under the setting where demographic labels are unavailable in the test set. Extensive experiments demonstrate that our method more effectively evaluates demographic biases compared to existing approaches that rely on pseudo-demographic labels. Furthermore, we observe that many existing studies do not include statistical testing in their bias evaluations, raising concerns that some reported biases may not be statistically significant but rather due to randomness. To address this issue, we introduce a plug-and-play statistical module to ensure the statistical significance of biased evaluation results. A comprehensive bias analysis based on the proposed module is then conducted across three sensitive attributes (age, gender, and race), seven facial expressions, and multiple network architectures on a large-scale dataset, revealing the prominent demographic biases in FER and providing insights on selecting a fairer network architecture.
Problem

Research questions and friction points this paper is trying to address.

Evaluating bias in FER models without demographic labels
Addressing limitations of pseudo-demographic labels in bias analysis
Ensuring statistical significance in bias evaluation results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-level bias evaluation without demographic labels
Plug-and-play statistical module for significance
Comprehensive bias analysis across multiple attributes
🔎 Similar Papers
No similar papers found.
T
Tangzheng Lian
Centre for Robotics Research, Department of Engineering, King’s College London, WC2R 2LS London, U.K.
Oya Celiktutan
Oya Celiktutan
Reader in AI & Robotics (Associate Professor) | Director of SAIR Lab @Centre for Robotics Research
Multimodal PerceptionMachine LearningHuman-Robot Interaction