Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This study investigates whether synthetic data can simultaneously improve accuracy and fairness in face recognition. To address demographic bias, we propose FairFaceGen—a demographically balanced synthetic face dataset—generated using Flux.1-dev and Stable Diffusion v3.5 (SD35), enhanced for identity consistency via Arc2Face and multiple IP-Adapters. We systematically evaluate model performance and racial bias across standard benchmarks: LFW, AgeDB-30, IJB-B/C, and RFW. Results demonstrate that high-fidelity, demographically balanced synthetic data significantly mitigates algorithmic bias. Crucially, the quantity and quality of intra-class synthetic augmentation are key determinants of both fairness and accuracy gains. SD35-generated data achieves superior performance on bias-sensitive benchmarks like RFW, while exhibiting slightly lower generalization than real data on IJB-B/C. Overall, this work validates the feasibility and promise of synthetic-data-driven approaches for developing fairer, more robust face recognition systems.

Technology Category

Application Category

📝 Abstract

Synthetic data has emerged as a promising alternative for training face recognition (FR) models, offering advantages in scalability, privacy compliance, and potential for bias mitigation. However, critical questions remain on whether both high accuracy and fairness can be achieved with synthetic data. In this work, we evaluate the impact of synthetic data on bias and performance of FR systems. We generate balanced face dataset, FairFaceGen, using two state of the art text-to-image generators, Flux.1-dev and Stable Diffusion v3.5 (SD35), and combine them with several identity augmentation methods, including Arc2Face and four IP-Adapters. By maintaining equal identity count across synthetic and real datasets, we ensure fair comparisons when evaluating FR performance on standard (LFW, AgeDB-30, etc.) and challenging IJB-B/C benchmarks and FR bias on Racial Faces in-the-Wild (RFW) dataset. Our results demonstrate that although synthetic data still lags behind the real datasets in the generalization on IJB-B/C, demographically balanced synthetic datasets, especially those generated with SD35, show potential for bias mitigation. We also observe that the number and quality of intra-class augmentations significantly affect FR accuracy and fairness. These findings provide practical guidelines for constructing fairer FR systems using synthetic data.

Problem

Research questions and friction points this paper is trying to address.

Evaluating synthetic data impact on face recognition accuracy and bias

Assessing fairness and performance of synthetic face datasets

Exploring synthetic data potential for bias mitigation in face recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates balanced synthetic face dataset FairFaceGen

Uses text-to-image generators Flux.1-dev and SD35

Combines with identity augmentation methods Arc2Face

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)