🤖 AI Summary
To address the suboptimal adaptation of pretrained models for sentiment analysis in low-resource languages like Hausa, this paper proposes Language-Adaptive Fine-Tuning (LAFT): first performing unsupervised domain- and language-specific adaptation of AfriBERTa on unlabeled Hausa corpora, followed by supervised fine-tuning on the NaijaSenti dataset. This work represents the first application of LAFT to Hausa sentiment analysis, explicitly accounting for linguistic characteristics of informal social media text. Experiments demonstrate consistent, modest performance gains from LAFT; AfriBERTa substantially outperforms multilingual baselines without language-specific adaptation, underscoring the critical role of language-targeted pretraining in low-resource settings. All data and code are publicly released to advance NLP research for African languages.
📝 Abstract
Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by ~identifying sentiments expressed in text. Although significant advances have been made in SA for widely spoken languages, low-resource languages such as Hausa face unique challenges, primarily due to a lack of digital resources. This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa. We first curate a diverse, unlabeled corpus to expand the model's linguistic capabilities, followed by applying LAFT to adapt AfriBERTa specifically to the nuances of the Hausa language. The adapted model is then fine-tuned on the labeled NaijaSenti sentiment dataset to evaluate its performance. Our findings demonstrate that LAFT gives modest improvements, which may be attributed to the use of formal Hausa text rather than informal social media data. Nevertheless, the pre-trained AfriBERTa model significantly outperformed models not specifically trained on Hausa, highlighting the importance of using pre-trained models in low-resource contexts. This research emphasizes the necessity for diverse data sources to advance NLP applications for low-resource African languages. We published the code and the dataset to encourage further research and facilitate reproducibility in low-resource NLP here: https://github.com/Sani-Abdullahi-Sani/Natural-Language-Processing/blob/main/Sentiment%20Analysis%20for%20Low%20Resource%20African%20Languages