University of Indonesia at SemEval-2025 Task 11: Evaluating State-of-the-Art Encoders for Multi-Label Emotion Detection

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses multilingual multi-label sentiment detection across 28 languages. Method: We systematically evaluate leading multilingual encoders—including mE5, BGE, XLM-R, and mBERT—under both prompt-based encoding and full-parameter fine-tuning paradigms. We propose an ensemble framework leveraging multi-configuration BGE embeddings, fused via a CatBoost classifier optimized for macro-F1 in multi-label settings. Contribution/Results: We first observe that frozen prompt encoders (e.g., mE5/BGE) paired with lightweight classifiers substantially outperform fully fine-tuned XLM-R or mBERT. Our final system achieves a mean macro-F1 of 56.58 on SemEval-2025 Task 11 Track A, ranking among the top submissions. This demonstrates the effectiveness, efficiency, and scalability of lightweight, prompt-driven multilingual sentiment modeling—offering a viable alternative to resource-intensive full fine-tuning.

Technology Category

Application Category

📝 Abstract
This paper presents our approach for SemEval 2025 Task 11 Track A, focusing on multilabel emotion classification across 28 languages. We explore two main strategies: fully fine-tuning transformer models and classifier-only training, evaluating different settings such as fine-tuning strategies, model architectures, loss functions, encoders, and classifiers. Our findings suggest that training a classifier on top of prompt-based encoders such as mE5 and BGE yields significantly better results than fully fine-tuning XLMR and mBERT. Our best-performing model on the final leaderboard is an ensemble combining multiple BGE models, where CatBoost serves as the classifier, with different configurations. This ensemble achieves an average F1-macro score of 56.58 across all languages.
Problem

Research questions and friction points this paper is trying to address.

Multilabel emotion classification across 28 languages
Comparing fine-tuning vs classifier-only training strategies
Evaluating encoder and classifier performance in emotion detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning transformer models for emotion detection
Using prompt-based encoders like mE5 and BGE
Ensemble model with CatBoost classifier for performance
🔎 Similar Papers
No similar papers found.
I
Ikhlasul Akmal Hanif
Universitas Indonesia
E
Eryawan Presma Yulianrifat
Universitas Indonesia
J
Jaycent Gunawan Ongris
Universitas Indonesia
E
Eduardus Tjitrahardja
Universitas Indonesia
Muhammad Falensi Azmi
Muhammad Falensi Azmi
Universitas Indonesia
Natural Language Processing
R
Rahmat Bryan Naufal
Universitas Indonesia
Alfan Farizki Wicaksono
Alfan Farizki Wicaksono
Researcher, Faculty of Computer Science, Universitas Indonesia
Ilmu AgamaInformation RetrievalDistributed Semantic ModelText Mining