MAGE: Multi-Head Attention Guided Embeddings for Low Resource Sentiment Classification

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited sentiment classification performance in low-resource Bantu languages—attributed to the scarcity of high-quality annotated data—this paper proposes a novel method integrating language-agnostic data augmentation (LiDA) with multi-head attention-weighted embeddings. The approach initializes the model via cross-lingual transfer learning, employs multi-head attention to dynamically assess sample importance, and enables adaptive, selective augmentation of salient instances; concurrently, weighted word/sentence embeddings enhance semantic representation. Experiments across multiple Bantu languages demonstrate substantial improvements over strong baselines, with average accuracy gains of 4.2–7.8 percentage points. This work constitutes the first integration of LiDA with attention-driven embedding weighting, establishing a scalable, robust, and zero-shot target-language-labeling paradigm for low-resource NLP.

Technology Category

Application Category

📝 Abstract
Due to the lack of quality data for low-resource Bantu languages, significant challenges are presented in text classification and other practical implementations. In this paper, we introduce an advanced model combining Language-Independent Data Augmentation (LiDA) with Multi-Head Attention based weighted embeddings to selectively enhance critical data points and improve text classification performance. This integration allows us to create robust data augmentation strategies that are effective across various linguistic contexts, ensuring that our model can handle the unique syntactic and semantic features of Bantu languages. This approach not only addresses the data scarcity issue but also sets a foundation for future research in low-resource language processing and classification tasks.
Problem

Research questions and friction points this paper is trying to address.

Low-resource Bantu languages
Text classification challenges
Data scarcity in sentiment analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Head Attention Embeddings
Language-Independent Data Augmentation
Low-Resource Sentiment Classification
🔎 Similar Papers
No similar papers found.
V
Varun Vashisht
School of Computer Science and Engineering, Vellore Institute of Technology
S
Samar Singh
School of Computer Science and Engineering, Vellore Institute of Technology
M
Mihir Konduskar
School of Computer Science and Engineering, Vellore Institute of Technology
Jaskaran Singh Walia
Jaskaran Singh Walia
Microsoft, Carnegie Mellon University
Computer VisionLLMsGraph TheoryNLPCausal Inference
Vukosi Marivate
Vukosi Marivate
University of Pretoria, Lelapa AI, Deep Learning Indaba, Masakhane Research Foundation
Data ScienceNatural Language ProcessingMachine LearningArtificial IntelligenceReinforcement