Automatic Classifiers Underdetect Emotions Expressed by Men

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses systematic gender bias in current emotion classification models, which exhibit significantly lower accuracy in recognizing emotions expressed by males. Leveraging a dataset of over one million self-annotated texts and employing a pre-registered experimental design, the research systematically evaluates gender disparities across 414 model–emotion category combinations. It presents the first large-scale evidence from self-reported data demonstrating that emotion recognition models consistently underestimate male-expressed emotions and quantifies the potential downstream impact of this bias. Results reveal that error rates for male-authored texts are significantly higher than those for female-authored texts across all model types and emotion categories, underscoring a critical gap in demographic fairness within existing affective computing technologies.

Technology Category

Application Category

πŸ“ Abstract
The widespread adoption of automatic sentiment and emotion classifiers makes it important to ensure that these tools perform reliably across different populations. Yet their reliability is typically assessed using benchmarks that rely on third-party annotators rather than the individuals experiencing the emotions themselves, potentially concealing systematic biases. In this paper, we use a unique, large-scale dataset of more than one million self-annotated posts and a pre-registered research design to investigate gender biases in emotion detection across 414 combinations of models and emotion-related classes. We find that across different types of automatic classifiers and various underlying emotions, error rates are consistently higher for texts authored by men compared to those authored by women. We quantify how this bias could affect results in downstream applications and show that current machine learning tools, including large language models, should be applied with caution when the gender composition of a sample is not known or variable. Our findings demonstrate that sentiment analysis is not yet a solved problem, especially in ensuring equitable model behaviour across demographic groups.
Problem

Research questions and friction points this paper is trying to address.

emotion detection
gender bias
automatic classifiers
sentiment analysis
model fairness
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-annotation
gender bias
emotion detection
pre-registered design
fairness in NLP
πŸ”Ž Similar Papers
I
Ivan Smirnov
University of Technology Sydney, Australia
S
S. Aroyehun
University of Konstanz, Germany
P
P. Plener
Medical University of Vienna, Austria; University of Ulm, Germany
David Garcia
David Garcia
Professor for Social and Behavioral Data Science, University of Konstanz. Also CSH Vienna and ETHZ
Computational social sciencecollective emotionspolarizationprivacyagent-based modeling