🤖 AI Summary
This study identifies, for the first time, systematic gender bias in large language models (LLMs) on affective theory-of-mind tasks—specifically, when inferring emotional states (“How does this person feel?”) from person descriptions and contextual cues. Method: We construct a standardized benchmark and conduct quantitative fairness analysis across mainstream LLMs, evaluating both inference-time prompting strategies (e.g., few-shot, chain-of-thought) and training-stage debiasing techniques—including adversarial training and fairness-aware regularization. Contribution/Results: We find that prompt engineering alone yields negligible bias reduction, whereas fine-tuning–stage interventions significantly mitigate gender bias, achieving an average 42.3% reduction. This work transcends the limitations of conventional prompting approaches, empirically establishing the critical role of training-phase interventions in enhancing fairness in affective reasoning. It introduces a novel paradigm for trustworthy evaluation and governance of LLMs’ social-cognitive capabilities.
📝 Abstract
The rapid advancement of large language models (LLMs) and their growing integration into daily life underscore the importance of evaluating and ensuring their fairness. In this work, we examine fairness within the domain of emotional theory of mind, investigating whether LLMs exhibit gender biases when presented with a description of a person and their environment and asked, "How does this person feel?". Furthermore, we propose and evaluate several debiasing strategies, demonstrating that achieving meaningful reductions in bias requires training based interventions rather than relying solely on inference-time prompt-based approaches such as prompt engineering.