🤖 AI Summary
This study addresses multilingual multi-label sentiment detection across 28 languages. Method: We systematically evaluate leading multilingual encoders—including mE5, BGE, XLM-R, and mBERT—under both prompt-based encoding and full-parameter fine-tuning paradigms. We propose an ensemble framework leveraging multi-configuration BGE embeddings, fused via a CatBoost classifier optimized for macro-F1 in multi-label settings. Contribution/Results: We first observe that frozen prompt encoders (e.g., mE5/BGE) paired with lightweight classifiers substantially outperform fully fine-tuned XLM-R or mBERT. Our final system achieves a mean macro-F1 of 56.58 on SemEval-2025 Task 11 Track A, ranking among the top submissions. This demonstrates the effectiveness, efficiency, and scalability of lightweight, prompt-driven multilingual sentiment modeling—offering a viable alternative to resource-intensive full fine-tuning.
📝 Abstract
This paper presents our approach for SemEval 2025 Task 11 Track A, focusing on multilabel emotion classification across 28 languages. We explore two main strategies: fully fine-tuning transformer models and classifier-only training, evaluating different settings such as fine-tuning strategies, model architectures, loss functions, encoders, and classifiers. Our findings suggest that training a classifier on top of prompt-based encoders such as mE5 and BGE yields significantly better results than fully fine-tuning XLMR and mBERT. Our best-performing model on the final leaderboard is an ensemble combining multiple BGE models, where CatBoost serves as the classifier, with different configurations. This ensemble achieves an average F1-macro score of 56.58 across all languages.