AI Enabled User-Specific Cyberbullying Severity Detection with Explainability

📅 2025-03-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Cyberbullying (CB) is escalating, yet existing detection models largely overlook victim-specific individual differences, hindering accurate severity assessment. To address this, we propose the first user-specific CB severity detection framework that integrates psychological traits (e.g., depression, self-esteem, anxiety), behavioral patterns (e.g., online usage habits), demographic attributes (e.g., race, gender, prior bullying exposure), and comment text for fine-grained three-class classification: non-bullying, mild, and severe. Our approach innovatively introduces a user-driven relabeling mechanism and is the first to embed multidimensional individual attributes into the modeling pipeline. We fuse 146 heterogeneous features—including Word2Vec and LSTM-based textual representations, sentiment and topic features—and employ SHAP and LIME for model interpretability. The framework achieves 98% accuracy and an F1-score of 0.97, effectively identifying vulnerability markers—such as low self-esteem and high anxiety—and critical risk factors.

Technology Category

Application Category

📝 Abstract

The rise of social media has significantly increased the prevalence of cyberbullying (CB), posing serious risks to both mental and physical well-being. Effective detection systems are essential for mitigating its impact. While several machine learning (ML) models have been developed, few incorporate victims' psychological, demographic, and behavioral factors alongside bullying comments to assess severity. In this study, we propose an AI model intregrating user-specific attributes, including psychological factors (self-esteem, anxiety, depression), online behavior (internet usage, disciplinary history), and demographic attributes (race, gender, ethnicity), along with social media comments. Additionally, we introduce a re-labeling technique that categorizes social media comments into three severity levels: Not Bullying, Mild Bullying, and Severe Bullying, considering user-specific factors.Our LSTM model is trained using 146 features, incorporating emotional, topical, and word2vec representations of social media comments as well as user-level attributes and it outperforms existing baseline models, achieving the highest accuracy of 98% and an F1-score of 0.97. To identify key factors influencing the severity of cyberbullying, we employ explainable AI techniques (SHAP and LIME) to interpret the model's decision-making process. Our findings reveal that, beyond hate comments, victims belonging to specific racial and gender groups are more frequently targeted and exhibit higher incidences of depression, disciplinary issues, and low self-esteem. Additionally, individuals with a prior history of bullying are at a greater risk of becoming victims of cyberbullying.

Problem

Research questions and friction points this paper is trying to address.

Detect cyberbullying severity using user-specific attributes and comments

Classify social media comments into three severity levels

Explain model decisions with AI techniques (SHAP and LIME)

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI model integrates user-specific psychological and demographic attributes

Re-labeling technique categorizes comments into three severity levels

Explainable AI techniques (SHAP, LIME) interpret model decisions

🔎 Similar Papers

No similar papers found.