🤖 AI Summary
To address the limited sentiment classification performance in low-resource Bantu languages—attributed to the scarcity of high-quality annotated data—this paper proposes a novel method integrating language-agnostic data augmentation (LiDA) with multi-head attention-weighted embeddings. The approach initializes the model via cross-lingual transfer learning, employs multi-head attention to dynamically assess sample importance, and enables adaptive, selective augmentation of salient instances; concurrently, weighted word/sentence embeddings enhance semantic representation. Experiments across multiple Bantu languages demonstrate substantial improvements over strong baselines, with average accuracy gains of 4.2–7.8 percentage points. This work constitutes the first integration of LiDA with attention-driven embedding weighting, establishing a scalable, robust, and zero-shot target-language-labeling paradigm for low-resource NLP.
📝 Abstract
Due to the lack of quality data for low-resource Bantu languages, significant challenges are presented in text classification and other practical implementations. In this paper, we introduce an advanced model combining Language-Independent Data Augmentation (LiDA) with Multi-Head Attention based weighted embeddings to selectively enhance critical data points and improve text classification performance. This integration allows us to create robust data augmentation strategies that are effective across various linguistic contexts, ensuring that our model can handle the unique syntactic and semantic features of Bantu languages. This approach not only addresses the data scarcity issue but also sets a foundation for future research in low-resource language processing and classification tasks.