On Applicability of Synthetic Datasets for Facial Expression Recognition

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenges of facial expression recognition posed by class imbalance and privacy constraints in publicly available datasets, which hinder the construction of large-scale, balanced training data. To overcome these limitations, the authors propose a privacy-preserving synthetic data generation framework that systematically integrates three complementary strategies: confidence-thresholded pseudo-labeling, attribute-guided diffusion model synthesis, and identity-preserving, task-aware GAN-based expression editing. Leveraging IR50 and POSTERv1 as backbone networks, the framework combines synthetic data from sources such as DigiFace, DCFace, EmoNet-Face BIG, and FFHQ-generated images, and is trained and evaluated across AffectNet, RAF-DB, and FER2013. Experimental results demonstrate that the proposed approach effectively mitigates both class imbalance and privacy concerns while serving as a viable substitute or supplement to real data, achieving competitive recognition performance.

📝 Abstract

Facial Expression Recognition faces two core challenges. The first is class imbalance in public datasets, which skews the learning process and weakens generalization. The second is related to privacy and data collection constraints, which limit the sharing of facial images and restrict the creation of large, balanced datasets. To address these issues, we examine three complementary strategies for constructing privacy-preserving FER datasets in the standard seven discrete facial expression classes setting. Our strategies are: (i) pseudo-labeling large unlabeled face collections with a teacher model under a confidence-thresholding scheme, (ii) prompt-driven synthesis using diffusion models conditioned on demographic attributes, and (iii) task-aware GAN-based expression editing that modifies facial expression while preserving identity and realism. For training and evaluation, we employed widely adopted datasets, including AffectNet, RAF-DB, and FER2013. We utilized the synthetic datasets DigiFace, DCFace, and EmoNet-Face BIG as unlabeled sources for pseudo-labeling. Additionally, we utilized the FFHQ dataset as the source for generative synthesis. The main experiments are conducted using a classic CNN backbone, IR50, and we also explore a more complex architecture, POSTERv1, to assess its feasibility and robustness. Using cross-dataset evaluations, we analyze the trade-offs each strategy presents in curated datasets. The findings demonstrate how synthetic data can effectively substitute or be combined with real datasets to mitigate imbalance and privacy limitations. Code and generated datasets:https://www.github.com/AliAZ98/SyntFER

Problem

Research questions and friction points this paper is trying to address.

Facial Expression Recognition

class imbalance

privacy constraints

synthetic datasets

data collection limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data

facial expression recognition

diffusion models