Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Face recognition systems (FRS) exhibit significant performance disparities across demographic groups, necessitating a rigorous understanding of how datasets, model architectures, and loss functions jointly affect accuracy and fairness—particularly in gender prediction. Method: We conduct a systematic empirical analysis across 7 real-world face datasets, 10 CNN/Transformer architectures, and 4 loss functions (e.g., ArcFace, cross-entropy), yielding 266 configuration combinations. Contribution/Results: We uncover that datasets possess intrinsic bias tendencies—capable of dominating or even reversing model-level gender bias direction. We further demonstrate that models fail to learn consistent gender representations across datasets, motivating a novel “component co-influence” framework. Crucially, dataset choice emerges as the primary driver of bias; identical models exhibit contradictory gender bias patterns across different in-the-wild datasets. Our work establishes a reproducible fairness evaluation benchmark and provides actionable deployment guidelines for equitable FRS development.

Technology Category

Application Category

📝 Abstract
Automated Face Recognition Systems (FRSs), developed using deep learning models, are deployed worldwide for identity verification and facial attribute analysis. The performance of these models is determined by a complex interdependence among the model architecture, optimization/loss function and datasets. Although FRSs have surpassed human-level accuracy, they continue to be disparate against certain demographics. Due to the ubiquity of applications, it is extremely important to understand the impact of the three components -- model architecture, loss function and face image dataset on the accuracy-disparity trade-off to design better, unbiased platforms. In this work, we perform an in-depth analysis of three FRSs for the task of gender prediction, with various architectural modifications resulting in ten deep-learning models coupled with four loss functions and benchmark them on seven face datasets across 266 evaluation configurations. Our results show that all three components have an individual as well as a combined impact on both accuracy and disparity. We identify that datasets have an inherent property that causes them to perform similarly across models, independent of the choice of loss functions. Moreover, the choice of dataset determines the model's perceived bias -- the same model reports bias in opposite directions for three gender-balanced datasets of ``in-the-wild'' face images of popular individuals. Studying the facial embeddings shows that the models are unable to generalize a uniform definition of what constitutes a ``female face'' as opposed to a ``male face'', due to dataset diversity. We provide recommendations to model developers on using our study as a blueprint for model development and subsequent deployment.
Problem

Research questions and friction points this paper is trying to address.

Analyzes accuracy-disparity trade-offs in face recognition systems.
Examines impact of datasets, architectures, and loss functions on bias.
Identifies dataset influence on model bias and gender prediction accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes dataset impact on model bias.
Explores loss functions and architectures.
Benchmarks models across diverse datasets.
🔎 Similar Papers
No similar papers found.