TrustSkin: A Fairness Pipeline for Trustworthy Facial Affect Analysis Across Skin Tone

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the unfair evaluation of Facial Affect Analysis (FAA) systems across skin tones by proposing the first trustworthiness-aware evaluation framework for skin-tone fairness. We identify three key limitations in prior work: reliance on illumination-sensitive Individual Typology Angle (ITA) for skin-tone quantification; severe underrepresentation of dark-skinned samples (~2% in AffectNet); and coarse-grained fairness diagnostics. To overcome these, we innovatively adopt the perceptually uniform L*–H* color space—replacing ITA—to improve skin-tone grouping stability. We further design a modular fairness analysis pipeline that uniquely integrates fine-grained fairness metrics (e.g., Equal Opportunity) with Grad-CAM-based attention visualization, enabling interpretable attribution of skin-tone bias. Experiments reveal substantial performance disparities: F1 and true positive rate (TPR) drop by 0.08 and 0.11, respectively, for darker skin groups. Moreover, the L*–H* space significantly enhances detection sensitivity to fairness violations compared to ITA.

Technology Category

Application Category

📝 Abstract

Understanding how facial affect analysis (FAA) systems perform across different demographic groups requires reliable measurement of sensitive attributes such as ancestry, often approximated by skin tone, which itself is highly influenced by lighting conditions. This study compares two objective skin tone classification methods: the widely used Individual Typology Angle (ITA) and a perceptually grounded alternative based on Lightness ($L^*$) and Hue ($H^*$). Using AffectNet and a MobileNet-based model, we assess fairness across skin tone groups defined by each method. Results reveal a severe underrepresentation of dark skin tones ($sim 2 %$), alongside fairness disparities in F1-score (up to 0.08) and TPR (up to 0.11) across groups. While ITA shows limitations due to its sensitivity to lighting, the $H^*$-$L^*$ method yields more consistent subgrouping and enables clearer diagnostics through metrics such as Equal Opportunity. Grad-CAM analysis further highlights differences in model attention patterns by skin tone, suggesting variation in feature encoding. To support future mitigation efforts, we also propose a modular fairness-aware pipeline that integrates perceptual skin tone estimation, model interpretability, and fairness evaluation. These findings emphasize the relevance of skin tone measurement choices in fairness assessment and suggest that ITA-based evaluations may overlook disparities affecting darker-skinned individuals.

Problem

Research questions and friction points this paper is trying to address.

Evaluates fairness of facial affect analysis across skin tones

Compares ITA and H-L methods for skin tone classification

Proposes fairness pipeline with interpretability and evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares ITA and H-L skin tone classification methods

Proposes modular fairness-aware evaluation pipeline

Uses Grad-CAM for model attention pattern analysis

🔎 Similar Papers

The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning