🤖 AI Summary
To address the lack of benchmark datasets and challenges in fine-grained emotion modeling for Aspect-Based Emotion Analysis (ABEA) in social media, this paper introduces the first ABEA dataset specifically designed for tweets—comprising 2,621 English instances. Annotations follow Shaver’s emotion hierarchy, covering five basic emotions plus a “None” class, supporting both Aspect Term Extraction (ATE) and Aspect Emotion Classification (AEC) as joint tasks. Key contributions include: (1) establishing the first systematic ABEA benchmark; (2) adapting the GRACE model for fine-grained emotion recognition; and (3) employing crowd-sourced collaborative annotation with majority voting to ensure label quality. Experimental results show an F1-score of 70.1% on the ATE subtask and a joint-task performance of 46.9%, revealing that limited data volume and multi-emotion classification remain primary bottlenecks.
📝 Abstract
While sentiment analysis has advanced from sentence to aspect-level, i.e., the identification of concrete terms related to a sentiment, the equivalent field of Aspect-based Emotion Analysis (ABEA) is faced with dataset bottlenecks and the increased complexity of emotion classes in contrast to binary sentiments. This paper addresses these gaps, by generating a first ABEA training dataset, consisting of 2,621 English Tweets, and fine-tuning a BERT-based model for the ABEA sub-tasks of Aspect Term Extraction (ATE) and Aspect Emotion Classification (AEC). The dataset annotation process was based on the hierarchical emotion theory by Shaver et al. [1] and made use of group annotation and majority voting strategies to facilitate label consistency. The resulting dataset contained aspect-level emotion labels for Anger, Sadness, Happiness, Fear, and a None class. Using the new ABEA training dataset, the state-of-the-art ABSA model GRACE by Luo et al. [2] was fine-tuned for ABEA. The results reflected a performance plateau at an F1-score of 70.1% for ATE and 46.9% for joint ATE and AEC extraction. The limiting factors for model performance were broadly identified as the small training dataset size coupled with the increased task complexity, causing model overfitting and limited abilities to generalize well on new data.