PromotionGo at SemEval-2025 Task 11: A Feature-Centric Framework for Cross-Lingual Multi-Emotion Detection in Short Texts

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Cross-lingual short-text multi-label sentiment detection faces challenges from high linguistic diversity and severe scarcity of annotated data for low-resource languages. Method: This paper proposes a feature-centric dynamic adaptation framework that integrates language-specific feature selection with a lightweight dynamic modeling mechanism to enhance cross-lingual generalization while preserving model simplicity. It leverages heterogeneous document representations—including TF-IDF, FastText, and Sentence-BERT—combined with PCA-based dimensionality reduction and an MLP classifier to jointly optimize efficiency and accuracy. Contribution/Results: Evaluated across 28 languages, the framework demonstrates that TF-IDF significantly outperforms semantic embedding methods on low-resource languages; PCA reduces training time by over 70% with negligible accuracy degradation. The approach achieves high efficiency, strong scalability, and robustness, establishing a practical new paradigm for multilingual sentiment analysis under resource-constrained conditions.

Technology Category

Application Category

📝 Abstract

This paper presents our system for SemEval 2025 Task 11: Bridging the Gap in Text-Based Emotion Detection (Track A), which focuses on multi-label emotion detection in short texts. We propose a feature-centric framework that dynamically adapts document representations and learning algorithms to optimize language-specific performance. Our study evaluates three key components: document representation, dimensionality reduction, and model training in 28 languages, highlighting five for detailed analysis. The results show that TF-IDF remains highly effective for low-resource languages, while contextual embeddings like FastText and transformer-based document representations, such as those produced by Sentence-BERT, exhibit language-specific strengths. Principal Component Analysis (PCA) reduces training time without compromising performance, particularly benefiting FastText and neural models such as Multi-Layer Perceptrons (MLP). Computational efficiency analysis underscores the trade-off between model complexity and processing cost. Our framework provides a scalable solution for multilingual emotion detection, addressing the challenges of linguistic diversity and resource constraints.

Problem

Research questions and friction points this paper is trying to address.

Multi-label emotion detection in short texts across languages

Optimizing language-specific performance with adaptive document representations

Addressing linguistic diversity and resource constraints in emotion detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-centric framework adapts document representations dynamically

TF-IDF and contextual embeddings optimize language-specific performance

PCA reduces training time without performance loss

🔎 Similar Papers

No similar papers found.