🤖 AI Summary
This study addresses the challenge of sentiment analysis in Arabic hotel reviews, where code-mixing between Modern Standard Arabic (MSA) and Saudi and Moroccan dialects (Darija) complicates linguistic modeling. To tackle this, we construct the first balanced, multi-dialectal, domain-specific dataset for the hospitality sector, manually translated and rigorously validated by native speakers. We launch the inaugural shared task on multi-dialectal Arabic sentiment analysis in hospitality, fostering practical adoption of dialect-aware NLP for customer experience analytics. The task attracted over 40 participating teams, with 12 submitting working systems; the top-performing model achieved an F1-score of 0.81. Key contributions include: (1) releasing the first native-speaker-verified, cross-dialectal hotel review dataset; (2) establishing the first benchmark for multi-dialectal sentiment classification; and (3) empirically validating the effectiveness of pretrained language models on low-resource Arabic dialects—providing essential infrastructure and empirical evidence to advance dialect adaptation research.
📝 Abstract
The hospitality industry in the Arab world increasingly relies on customer feedback to shape services, driving the need for advanced Arabic sentiment analysis tools. To address this challenge, the Sentiment Analysis on Arabic Dialects in the Hospitality Domain shared task focuses on Sentiment Detection in Arabic Dialects. This task leverages a multi-dialect, manually curated dataset derived from hotel reviews originally written in Modern Standard Arabic (MSA) and translated into Saudi and Moroccan (Darija) dialects. The dataset consists of 538 sentiment-balanced reviews spanning positive, neutral, and negative categories. Translations were validated by native speakers to ensure dialectal accuracy and sentiment preservation. This resource supports the development of dialect-aware NLP systems for real-world applications in customer experience analysis. More than 40 teams have registered for the shared task, with 12 submitting systems during the evaluation phase. The top-performing system achieved an F1 score of 0.81, demonstrating the feasibility and ongoing challenges of sentiment analysis across Arabic dialects.