SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

📅 2024-10-13

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Facial expression datasets suffer from limited scale due to subjective labeling, high acquisition costs, and privacy constraints—hindering deep learning models, especially foundation models. To address this, we propose SynFER, the first synthesis framework integrating textual semantic descriptions with fine-grained facial Action Unit (AU) control. It employs semantic-guided generation coupled with iterative pseudo-label refinement to significantly enhance synthetic image fidelity and label reliability. Built upon diffusion models, SynFER enables high-fidelity, controllable facial expression synthesis. Using synthetic data equivalent in scale to AffectNet, it achieves 67.23% accuracy; scaling to five times that size yields 69.84%, approaching performance attained with real-data training. SynFER establishes a scalable, interpretable, and privacy-preserving paradigm for synthetic data generation in low-resource facial expression analysis.

Technology Category

Application Category

📝 Abstract

Facial expression datasets remain limited in scale due to privacy concerns, the subjectivity of annotations, and the labor-intensive nature of data collection. This limitation poses a significant challenge for developing modern deep learning-based facial expression analysis models, particularly foundation models, that rely on large-scale data for optimal performance. To tackle the overarching and complex challenge, we introduce SynFER (Synthesis of Facial Expressions with Refined Control), a novel framework for synthesizing facial expression image data based on high-level textual descriptions as well as more fine-grained and precise control through facial action units. To ensure the quality and reliability of the synthetic data, we propose a semantic guidance technique to steer the generation process and a pseudo-label generator to help rectify the facial expression labels for the synthetic images. To demonstrate the generation fidelity and the effectiveness of the synthetic data from SynFER, we conduct extensive experiments on representation learning using both synthetic data and real-world data. Experiment results validate the efficacy of the proposed approach and the synthetic data. Notably, our approach achieves a 67.23% classification accuracy on AffectNet when training solely with synthetic data equivalent to the AffectNet training set size, which increases to 69.84% when scaling up to five times the original size. Our code will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Limited scale of facial expression datasets hinders deep learning models

Need for synthetic data to boost facial expression recognition performance

Ensuring quality and reliability of synthetic facial expression data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic facial expression data generation

Textual and facial action unit control

Semantic guidance and pseudo-labeling techniques

🔎 Similar Papers

Rethinking the Learning Paradigm for Facial Expression Recognition

2022-09-30arXiv.orgCitations: 4

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)