"Only ChatGPT gets me": An Empirical Analysis of GPT versus other Large Language Models for Emotion Detection in Text

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study systematically evaluates large language models’ (LLMs) capability in fine-grained sentiment recognition—specifically, their accuracy in identifying psychologically grounded emotions (e.g., the 27 emotion categories in GoEmotions). Method: We introduce the first cross-model benchmark grounded in a unified cognitive psychology emotion framework, integrating zero-shot and few-shot prompting with rigorous statistical significance testing (e.g., permutation tests). Contribution/Results: Our approach establishes psycholinguistically coherent evaluation criteria and uncovers systematic relationships between model architecture and emotional generalization performance. Experiments show GPT-4 significantly outperforms leading open- and closed-source LLMs on fine-grained emotion classification (p < 0.01), yet all models exhibit consistent biases in recognizing compound emotions. These findings provide both theoretical foundations and empirical benchmarks for enhancing affective sensitivity in human–AI interaction.

Technology Category

Application Category

📝 Abstract

This work investigates the capabilities of large language models (LLMs) in detecting and understanding human emotions through text. Drawing upon emotion models from psychology, we adopt an interdisciplinary perspective that integrates computational and affective sciences insights. The main goal is to assess how accurately they can identify emotions expressed in textual interactions and compare different models on this specific task. This research contributes to broader efforts to enhance human-computer interaction, making artificial intelligence technologies more responsive and sensitive to users' emotional nuances. By employing a methodology that involves comparisons with a state-of-the-art model on the GoEmotions dataset, we aim to gauge LLMs' effectiveness as a system for emotional analysis, paving the way for potential applications in various fields that require a nuanced understanding of human language.

Problem

Research questions and friction points this paper is trying to address.

Assessing accuracy of LLMs in emotion detection from text.

Comparing GPT with other models on emotion identification tasks.

Enhancing AI sensitivity to emotional nuances in human-computer interaction.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares GPT with other LLMs for emotion detection

Integrates psychology and computational science insights

Uses GoEmotions dataset for emotional analysis evaluation

🔎 Similar Papers

No similar papers found.