AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This study addresses the lack of systematic analysis and annotated resources for positive discourse—such as hope, resilience, and solidarity—in Arabic-language social media within crisis contexts. It proposes the first definition and annotation framework for hope-related expressions in Arabic and introduces AraHopeCorpus, the first publicly available dataset comprising over 10,000 YouTube comments related to the Gaza war (2023–2024), manually annotated into hope, non-hope, and neutral categories with substantial inter-annotator agreement (Cohen’s Kappa = 0.71). The analysis reveals that expressions of hope predominantly manifest through religious encouragement, collective solidarity, and optimism about justice. Furthermore, comparative experiments with ChatGPT highlight the limitations of large language models in handling dialectal variation, irony, and culturally embedded meanings. Notably, 64% of the corpus consists of hope-laden utterances, offering a valuable resource for future research.

📝 Abstract

Social media has become a crucial arena for shaping public narratives during armed conflicts, providing space for both harmful and constructive communication. While hate speech and misinformation have been widely studied, expressions that promote resilience, solidarity, and optimism remain underexplored, particularly in Arabic contexts. This paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech collected from ten thousand YouTube comments related to the war on Gaza between 2023 and 2024. Using a detailed annotation framework, comments were classified into three categories: hope speech, no hope speech, and neutral or unclear discourse. The dataset shows that hopeful language dominates, accounting for more than sixty four percent of all comments. These expressions of hope appear mainly as religious encouragement, collective solidarity, and optimism for endurance and justice. No hope speech, representing about thirteen percent, reflects despair and disillusionment, while the rest of the comments contain neutral or mixed content. Inter-Annotator Agreement reached substantial levels (Cohen's Kappa equals 0.71), though dialectal variation, sarcasm, and implicit meaning posed annotation challenges. A comparative analysis between human annotators and ChatGPT revealed that large language models can support annotation but remain limited in handling dialectal and culturally embedded expressions. AraHopeCorpus will be released for research purposes under an open and non commercial license. It provides a valuable resource for studying constructive digital discourse, enabling further research on hope speech detection, crisis communication, and resilience in Arabic social media.

Problem

Research questions and friction points this paper is trying to address.

hope speech

Arabic social media

crisis discourse

constructive communication

annotation dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

hope speech

Arabic social media

crisis discourse