BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-resource languages suffer from severe performance bottlenecks in sentiment analysis due to the scarcity of high-quality labeled data. To address this, we introduce BRIGHTER, a multilingual, multi-label sentiment dataset covering 28 languages—including numerous low-resource varieties from Africa, Asia, Eastern Europe, and Latin America—annotated collaboratively by native speakers across diverse domains under a rigorous quality control framework. This work presents the first systematic multilingual, multi-label benchmark supporting joint recognition of sentiment categories and intensity levels. We propose a collaborative annotation framework and data governance methodology specifically designed for low-resource settings. We establish new state-of-the-art baselines on monolingual and cross-lingual multi-label classification tasks, revealing substantial inter-lingual performance disparities. Furthermore, we empirically delineate the effectiveness boundaries of LLM-augmented strategies, achieving an average 32.7% F1-score improvement across all 28 languages.

Technology Category

Application Category

📝 Abstract
People worldwide use language in subtle and complex ways to express emotions. While emotion recognition -- an umbrella term for several NLP tasks -- significantly impacts different applications in NLP and other fields, most work in the area is focused on high-resource languages. Therefore, this has led to major disparities in research and proposed solutions, especially for low-resource languages that suffer from the lack of high-quality datasets. In this paper, we present BRIGHTER-- a collection of multilabeled emotion-annotated datasets in 28 different languages. BRIGHTER covers predominantly low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances from various domains annotated by fluent speakers. We describe the data collection and annotation processes and the challenges of building these datasets. Then, we report different experimental results for monolingual and crosslingual multi-label emotion identification, as well as intensity-level emotion recognition. We investigate results with and without using LLMs and analyse the large variability in performance across languages and text domains. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition and discuss their impact and utility.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of emotion datasets in low-resource languages
Introduces BRIGHTER datasets for 28 languages
Explores monolingual and crosslingual emotion recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual emotion datasets creation
Low-resource languages coverage
LLMs in emotion recognition
🔎 Similar Papers
No similar papers found.
Shamsuddeen Hassan Muhammad
Shamsuddeen Hassan Muhammad
Bayero University, Kano, & Google DeepMind Academic Fellow at Imperial College London
Natural Language ProcessingSentiment AnalysisAfricaNLPLow-resource NLPMultilinguality
N
N. Ousidhoum
Cardiff University
Idris Abdulmumin
Idris Abdulmumin
Postdoctoral Fellow, DSFSI, University of Pretoria
Machine TranslationNeural Machine TranslationNatural Language ProcessingInternet Technology
Jan Philip Wahle
Jan Philip Wahle
University of Göttingen (Prev: NRC Canada)
Artificial IntelligenceMachine LarningDeep LearningNatural Language Processing
Terry Ruas
Terry Ruas
University of Göttingen (Prev: Uni. of Michigan, NII Tokyo, Uni. of Wuppertal, UFABC)
Natural Language ProcessingLexical SemanticsText GenerationParaphrasing
M
Meriem Beloucif
Uppsala University
Christine de Kock
Christine de Kock
NLP researcher, Melbourne University
natural language processingmachine learning
Nirmal Surange
Nirmal Surange
International Institute of Information Technology Hyderabad
Natural Language ProcessingNLG EvaluationGraph NLPIndicLanguages
D
Daniela Teodorescu
University of Alberta
I
I. Ahmad
Northeastern University
D
D. Adelani
MILA, McGill University, Canada CIFAR AI Chair
Alham Fikri Aji
Alham Fikri Aji
MBZUAI, Monash Indonesia
MultilingualityLow-resource NLPLanguage ModelingMachine Translation
F
Felermino D. M. A. Ali
LIACC, FEUP, University of Porto
I
I. Alimova
Vladimir Araujo
Vladimir Araujo
AI Research Scientist, Sailplane
Natural Language ProcessingDeep LearningContinual LearningRecommender Systems
Nikolay Babakov
Nikolay Babakov
University of Santiago de Compostela
Natural Language ProcessingBayesian Networks
N
Naomi Baes
University of Melbourne
Ana-Maria Bucur
Ana-Maria Bucur
Dalle Molle Institute for Artificial Intelligence (IDSIA), Università della Svizzera italiana
Computational LinguisticsMental Health
A
Andiswa Bukula
SADiLaR
G
Guanqun Cao
University of York
R
Rodrigo Tufino Cardenas
Universidad Politécnica Salesiana
Rendi Chevi
Rendi Chevi
MBZUAI
EmotionAffective ComputingNLPSpeech
C
C. Chukwuneke
Lancaster University
A
Alexandra Ciobotaru
University of Bucharest
Daryna Dementieva
Daryna Dementieva
TUM
NLPNLP for Social GoodHarmful Textual InformationMultilingualismResponsible AI
M
Murja Sani Gadanya
Bayero University Kano
R
Robert Geislinger
Hamburg University
B
Bela Gipp
University of Göttingen
O
Oumaima Hourrane
Al Akhawayn University
O
O. Ignat
Santa Clara University
F
F. I. Lawan
Kaduna State University
R
Rooweither Mabuya
SADiLaR
Rahmad Mahendra
Rahmad Mahendra
Universitas Indonesia and RMIT University
Natural Language ProcessingInformation ExtractionText MiningRecommender System
V
V. Marivate
DSFI, University of Pretoria
Andrew Piper
Andrew Piper
Professor of Languages, Literatures, and Cultures, McGill University
storytellingAIdata science
Alexander Panchenko
Alexander Panchenko
Associate Professor for Natural Language Processing
natural language processingword sense disambiguationtext style transferargument mininggraph
C
Charles Henrique Porto Ferreira
Centro Universitário FEI
V
Vitaly Protasov
S
Samuel Rutunda
Digital Umuganda
Manish Shrivastava
Manish Shrivastava
International Institute of Information Technology Hyderabad
Natural Language ProcessingMachine LearningMachine TranslationCross Lingual IRMultilingual Question Answering
A
Aura Cristina Udrea
National University of Science and Technology Politehnica Bucharest
L
Lilian D. A. Wanzare
Maseno University
Sophie Wu
Sophie Wu
M.A. Digital Humanities candidate, McGill University
F
Florian Valentin Wunderlich
University of Göttingen
H
Hanif Muhammad Zhafran
Institut Teknologi Bandung
T
Tianhui Zhang
University of Liverpool
Y
Yi Zhou
Cardiff University
S
Saif Mohammad
National Research Council Canada