EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Existing visual/visual-language affect recognition benchmarks suffer from three critical limitations: (1) a narrow affective spectrum—incapable of capturing nuanced states such as bitterness or euphoria; (2) ambiguous inter-class boundaries—e.g., shame vs. embarrassment; and (3) severe data biases—including pervasive facial occlusion and insufficient demographic diversity. To address these, we introduce EmoNet Face—the first high-fidelity, fine-grained affect recognition benchmark tailored for synthetic images. Our approach comprises: (i) a cognitively grounded 40-category affect taxonomy ensuring perceptual discriminability; (ii) controllable face modeling to generate large-scale, fully visible, demographically balanced AI-rendered faces; and (iii) a multi-expert collaborative annotation protocol coupled with fairness-aware data balancing. We release three complementary sub-datasets and the EmpathicInsight-Face model, which achieves human-expert-level annotation consistency on EmoNet Face (Cohen’s κ = 0.89).

Technology Category

Application Category

📝 Abstract

Effective human-AI interaction relies on AI's ability to accurately perceive and interpret human emotions. Current benchmarks for vision and vision-language models are severely limited, offering a narrow emotional spectrum that overlooks nuanced states (e.g., bitterness, intoxication) and fails to distinguish subtle differences between related feelings (e.g., shame vs. embarrassment). Existing datasets also often use uncontrolled imagery with occluded faces and lack demographic diversity, risking significant bias. To address these critical gaps, we introduce EmoNet Face, a comprehensive benchmark suite. EmoNet Face features: (1) A novel 40-category emotion taxonomy, meticulously derived from foundational research to capture finer details of human emotional experiences. (2) Three large-scale, AI-generated datasets (EmoNet HQ, Binary, and Big) with explicit, full-face expressions and controlled demographic balance across ethnicity, age, and gender. (3) Rigorous, multi-expert annotations for training and high-fidelity evaluation. (4) We built EmpathicInsight-Face, a model achieving human-expert-level performance on our benchmark. The publicly released EmoNet Face suite - taxonomy, datasets, and model - provides a robust foundation for developing and evaluating AI systems with a deeper understanding of human emotions.

Problem

Research questions and friction points this paper is trying to address.

Limited emotional spectrum in current emotion recognition benchmarks

Lack of demographic diversity and controlled imagery in datasets

Need for distinguishing subtle differences between related emotions

Innovation

Methods, ideas, or system contributions that make the work stand out.

40-category emotion taxonomy for nuanced states

AI-generated datasets with demographic balance

Multi-expert annotations for high-fidelity evaluation

🔎 Similar Papers

Rethinking the Learning Paradigm for Facial Expression Recognition