A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases

📅 2024-10-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses critical challenges in social media misinformation detection—including poor model generalizability and unreliable evaluation—specifically for fake news, spam, and fake accounts. We systematically review and empirically evaluate 36 machine learning and deep learning methods. For the first time, we apply the PROBAST tool to quantify systematic biases across the full lifecycle: data selection, class imbalance mitigation, linguistic preprocessing, and hyperparameter tuning—revealing pervasive bias throughout. We propose an upgraded, real-world-oriented evaluation paradigm: (i) resampling techniques to mitigate dataset bias; (ii) adoption of F1-score and AUROC—rather than accuracy—as primary metrics; and (iii) explicit emphasis on critical preprocessing steps such as negation handling. Experimental results demonstrate that standardized preprocessing and robust evaluation significantly enhance model reliability and cross-platform generalization capability.

Technology Category

Application Category

📝 Abstract
Social media platforms like Twitter, Facebook, and Instagram have facilitated the spread of misinformation, necessitating automated detection systems. This systematic review evaluates 36 studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media. Using the Prediction model Risk Of Bias ASsessment Tool (PROBAST), the review identified key biases across the ML lifecycle: selection bias due to non-representative sampling, inadequate handling of class imbalance, insufficient linguistic preprocessing (e.g., negations), and inconsistent hyperparameter tuning. Although models such as Support Vector Machines (SVM), Random Forests, and Long Short-Term Memory (LSTM) networks showed strong potential, over-reliance on accuracy as an evaluation metric in imbalanced data settings was a common flaw. The review highlights the need for improved data preprocessing (e.g., resampling techniques), consistent hyperparameter tuning, and the use of appropriate metrics like precision, recall, F1 score, and AUROC. Addressing these limitations can lead to more reliable and generalizable ML/DL models for detecting deceptive content, ultimately contributing to the reduction of misinformation on social media.
Problem

Research questions and friction points this paper is trying to address.

Detecting fake news, spam, and fake accounts on social media.
Identifying biases in machine learning lifecycle for detection.
Improving data preprocessing and evaluation metrics for reliability.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning for deceptive content detection
Deep learning models for fake news identification
Improved data preprocessing and hyperparameter tuning
🔎 Similar Papers
No similar papers found.