🤖 AI Summary
Cross-lingual short-text multi-label sentiment detection faces challenges from high linguistic diversity and severe scarcity of annotated data for low-resource languages.
Method: This paper proposes a feature-centric dynamic adaptation framework that integrates language-specific feature selection with a lightweight dynamic modeling mechanism to enhance cross-lingual generalization while preserving model simplicity. It leverages heterogeneous document representations—including TF-IDF, FastText, and Sentence-BERT—combined with PCA-based dimensionality reduction and an MLP classifier to jointly optimize efficiency and accuracy.
Contribution/Results: Evaluated across 28 languages, the framework demonstrates that TF-IDF significantly outperforms semantic embedding methods on low-resource languages; PCA reduces training time by over 70% with negligible accuracy degradation. The approach achieves high efficiency, strong scalability, and robustness, establishing a practical new paradigm for multilingual sentiment analysis under resource-constrained conditions.
📝 Abstract
This paper presents our system for SemEval 2025 Task 11: Bridging the Gap in Text-Based Emotion Detection (Track A), which focuses on multi-label emotion detection in short texts. We propose a feature-centric framework that dynamically adapts document representations and learning algorithms to optimize language-specific performance. Our study evaluates three key components: document representation, dimensionality reduction, and model training in 28 languages, highlighting five for detailed analysis. The results show that TF-IDF remains highly effective for low-resource languages, while contextual embeddings like FastText and transformer-based document representations, such as those produced by Sentence-BERT, exhibit language-specific strengths. Principal Component Analysis (PCA) reduces training time without compromising performance, particularly benefiting FastText and neural models such as Multi-Layer Perceptrons (MLP). Computational efficiency analysis underscores the trade-off between model complexity and processing cost. Our framework provides a scalable solution for multilingual emotion detection, addressing the challenges of linguistic diversity and resource constraints.