A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of label overlap, data scarcity, and limited model generalization in detecting cyberbullying in Bengali social media content. To this end, the authors propose a novel multi-label classification architecture that integrates BanglaBERT-Large with a two-layer stacked LSTM to jointly capture deep semantic context and sequential dependencies. This work represents the first effort to combine a context-aware pretrained language model with sequence modeling for multi-label cyberbullying detection in low-resource Bengali. The approach is further enhanced by a class-imbalance-aware sampling strategy and rigorous 5-fold cross-validation. Evaluated on a public dataset, the model demonstrates consistently strong performance across multiple metrics—including F1 score, AUC-ROC, and Cohen’s kappa—significantly improving both generalization capability and multi-label prediction accuracy.

Technology Category

Application Category

📝 Abstract
Cyberbullying has become a serious and growing concern in todays virtual world. When left unnoticed, it can have adverse consequences for social and mental health. Researchers have explored various types of cyberbullying, but most approaches use single-label classification, assuming that each comment contains only one type of abuse. In reality, a single comment may include overlapping forms such as threats, hate speech, and harassment. Therefore, multilabel detection is both realistic and essential. However, multilabel cyberbullying detection has received limited attention, especially in low-resource languages like Bangla, where robust pre-trained models are scarce. Developing a generalized model with moderate accuracy remains challenging. Transformers offer strong contextual understanding but may miss sequential dependencies, while LSTM models capture temporal flow but lack semantic depth. To address these limitations, we propose a fusion architecture that combines BanglaBERT-Large with a two-layer stacked LSTM. We analyze their behavior to jointly model context and sequence. The model is fine-tuned and evaluated on a publicly available multilabel Bangla cyberbullying dataset covering cyberbully, sexual harassment, threat, and spam. We apply different sampling strategies to address class imbalance. Evaluation uses multiple metrics, including accuracy, precision, recall, F1-score, Hamming loss, Cohens kappa, and AUC-ROC. We employ 5-fold cross-validation to assess the generalization of the architecture.
Problem

Research questions and friction points this paper is trying to address.

multi-label cyberbullying detection
low-resource language
Bangla
class imbalance
contextual understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

BanglaBERT
multi-label classification
stacked LSTM
cyberbullying detection
low-resource language
🔎 Similar Papers
No similar papers found.
M
Mirza Raquib
Department of Computer and Communication Engineering, International Islamic University Chittagong, Chattogram, Bangladesh; Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
A
Asif Pervez Polok
mPower Social Enterprise
K
Kedar Nath Biswas
Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
R
Rahat Uddin Azad
Department of Software Engineering, Daffodil International University, Bangladesh
Saydul Akbar Murad
Saydul Akbar Murad
PhD Student, University of Southern Mississippi
Machine LearningBCINeuroscienceEEGP2P Communication
Nick Rahimi
Nick Rahimi
Associate Professor, University of Southern Mississippi
CybersecurityTrustworthy AIDistributed SystemsP2P Network