🤖 AI Summary
This study systematically evaluates large language models’ (LLMs) capability in fine-grained sentiment recognition—specifically, their accuracy in identifying psychologically grounded emotions (e.g., the 27 emotion categories in GoEmotions). Method: We introduce the first cross-model benchmark grounded in a unified cognitive psychology emotion framework, integrating zero-shot and few-shot prompting with rigorous statistical significance testing (e.g., permutation tests). Contribution/Results: Our approach establishes psycholinguistically coherent evaluation criteria and uncovers systematic relationships between model architecture and emotional generalization performance. Experiments show GPT-4 significantly outperforms leading open- and closed-source LLMs on fine-grained emotion classification (p < 0.01), yet all models exhibit consistent biases in recognizing compound emotions. These findings provide both theoretical foundations and empirical benchmarks for enhancing affective sensitivity in human–AI interaction.
📝 Abstract
This work investigates the capabilities of large language models (LLMs) in detecting and understanding human emotions through text. Drawing upon emotion models from psychology, we adopt an interdisciplinary perspective that integrates computational and affective sciences insights. The main goal is to assess how accurately they can identify emotions expressed in textual interactions and compare different models on this specific task. This research contributes to broader efforts to enhance human-computer interaction, making artificial intelligence technologies more responsive and sensitive to users' emotional nuances. By employing a methodology that involves comparisons with a state-of-the-art model on the GoEmotions dataset, we aim to gauge LLMs' effectiveness as a system for emotional analysis, paving the way for potential applications in various fields that require a nuanced understanding of human language.