🤖 AI Summary
This work addresses the challenge of sentiment analysis for low-resource, regionally and culturally rich Bengali social media text. We propose a hybrid multi-model system integrating traditional machine learning and deep learning, evaluated on the EmoNoBa dataset (22,698 comments). Methodologically, we introduce the first integration of LIME with AdaBoost decision trees to enable fine-grained, interpretable predictions; design a collaborative BiLSTM–AdaBoost modeling framework to enhance representation learning for low-resource languages; and empirically validate PCA’s effectiveness in reducing dimensionality of TF-IDF + n-gram features. Results demonstrate that the AdaBoost+LIME approach maintains high accuracy while substantially improving model transparency and debuggability. Our framework delivers a reproducible, interpretable technical pathway for sentiment analysis in low-resource languages, advancing both practical applicability and analytical rigor in culturally nuanced NLP tasks.
📝 Abstract
Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla.