🤖 AI Summary
Existing visual/visual-language affect recognition benchmarks suffer from three critical limitations: (1) a narrow affective spectrum—incapable of capturing nuanced states such as bitterness or euphoria; (2) ambiguous inter-class boundaries—e.g., shame vs. embarrassment; and (3) severe data biases—including pervasive facial occlusion and insufficient demographic diversity. To address these, we introduce EmoNet Face—the first high-fidelity, fine-grained affect recognition benchmark tailored for synthetic images. Our approach comprises: (i) a cognitively grounded 40-category affect taxonomy ensuring perceptual discriminability; (ii) controllable face modeling to generate large-scale, fully visible, demographically balanced AI-rendered faces; and (iii) a multi-expert collaborative annotation protocol coupled with fairness-aware data balancing. We release three complementary sub-datasets and the EmpathicInsight-Face model, which achieves human-expert-level annotation consistency on EmoNet Face (Cohen’s κ = 0.89).
📝 Abstract
Effective human-AI interaction relies on AI's ability to accurately perceive and interpret human emotions. Current benchmarks for vision and vision-language models are severely limited, offering a narrow emotional spectrum that overlooks nuanced states (e.g., bitterness, intoxication) and fails to distinguish subtle differences between related feelings (e.g., shame vs. embarrassment). Existing datasets also often use uncontrolled imagery with occluded faces and lack demographic diversity, risking significant bias. To address these critical gaps, we introduce EmoNet Face, a comprehensive benchmark suite. EmoNet Face features: (1) A novel 40-category emotion taxonomy, meticulously derived from foundational research to capture finer details of human emotional experiences. (2) Three large-scale, AI-generated datasets (EmoNet HQ, Binary, and Big) with explicit, full-face expressions and controlled demographic balance across ethnicity, age, and gender. (3) Rigorous, multi-expert annotations for training and high-fidelity evaluation. (4) We built EmpathicInsight-Face, a model achieving human-expert-level performance on our benchmark. The publicly released EmoNet Face suite - taxonomy, datasets, and model - provides a robust foundation for developing and evaluating AI systems with a deeper understanding of human emotions.