Semantic and Contextual Modeling for Malicious Comment Detection with BERT-BiLSTM

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Detecting covert, context-dependent harmful comments—such as irony and sarcasm—in social media remains challenging due to semantic ambiguity and severe class imbalance. To address these issues, this paper proposes a BERT-BiLSTM cascaded model: it jointly fine-tunes BERT’s deep contextualized semantic representations with BiLSTM’s sequential modeling capability, thereby enhancing discrimination of nuanced linguistic phenomena and mitigating classifier bias induced by data imbalance. Evaluated on the Jigsaw Toxic Comment Classification dataset, the model achieves 0.94 accuracy, 0.94 precision, and 0.93 recall—outperforming standalone BERT, TextCNN, TextRNN, and TF-IDF–based traditional machine learning baselines. The architecture offers a transferable, end-to-end solution for fine-grained harmful content detection in low-resource, high-noise environments.

Technology Category

Application Category

📝 Abstract

This study aims to develop an efficient and accurate model for detecting malicious comments, addressing the increasingly severe issue of false and harmful content on social media platforms. We propose a deep learning model that combines BERT and BiLSTM. The BERT model, through pre-training, captures deep semantic features of text, while the BiLSTM network excels at processing sequential data and can further model the contextual dependencies of text. Experimental results on the Jigsaw Unintended Bias in Toxicity Classification dataset demonstrate that the BERT+BiLSTM model achieves superior performance in malicious comment detection tasks, with a precision of 0.94, recall of 0.93, and accuracy of 0.94. This surpasses other models, including standalone BERT, TextCNN, TextRNN, and traditional machine learning algorithms using TF-IDF features. These results confirm the superiority of the BERT+BiLSTM model in handling imbalanced data and capturing deep semantic features of malicious comments, providing an effective technical means for social media content moderation and online environment purification.

Problem

Research questions and friction points this paper is trying to address.

Develops a model for detecting malicious social media comments.

Combines BERT and BiLSTM to capture semantic and contextual features.

Achieves high precision and recall in toxicity classification tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines BERT and BiLSTM for text analysis

Captures deep semantic features using BERT

Models contextual dependencies with BiLSTM

🔎 Similar Papers

RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis