🤖 AI Summary
This study addresses the fine-grained identification of hope-related expressions in social media. We propose the first cross-lingual three-class hope classification framework—comprising General Hope, Realistic Hope, and Unrealistic Hope—covering English, Urdu, and Spanish. Leveraging XLM-RoBERTa, we fine-tune on PolyHope, a newly constructed multilingual dataset, significantly improving classification performance and model generalization—especially for low-resource languages like Urdu. Our methodological contributions are threefold: (1) the first formal definition and annotation of fine-grained, multilingual hope categories; (2) a joint training strategy explicitly designed to accommodate cross-lingual semantic variation; and (3) state-of-the-art performance on the PolyHope-M 2025 shared task, achieving superior macro-F1 scores across all languages—most notably a 4.2-point gain for Urdu. This work establishes a scalable technical foundation for modeling positive online discourse and supporting mental health interventions.
📝 Abstract
The detection of hopeful speech in social media has emerged as a critical task for promoting positive discourse and well-being. In this paper, we present a machine learning approach to multiclass hope speech detection across multiple languages, including English, Urdu, and Spanish. We leverage transformer-based models, specifically XLM-RoBERTa, to detect and categorize hope speech into three distinct classes: Generalized Hope, Realistic Hope, and Unrealistic Hope. Our proposed methodology is evaluated on the PolyHope dataset for the PolyHope-M 2025 shared task, achieving competitive performance across all languages. We compare our results with existing models, demonstrating that our approach significantly outperforms prior state-of-the-art techniques in terms of macro F1 scores. We also discuss the challenges in detecting hope speech in low-resource languages and the potential for improving generalization. This work contributes to the development of multilingual, fine-grained hope speech detection models, which can be applied to enhance positive content moderation and foster supportive online communities.