HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

📅 2024-11-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenge of detecting hate and offensive language in English tweets on X (formerly Twitter), proposing a zero-shot and few-shot prompting approach leveraging GPT-3.5 Turbo—without fine-tuning—to perform fine-grained binary classification (hate/offensive vs. non-hate/non-offensive). Through carefully engineered prompts, the method enhances the model’s capacity to interpret code-mixed, context-sensitive social media text, with Macro-F1 as the primary evaluation metric. Three independent experimental runs yield stable results (0.756, 0.751, 0.754), demonstrating balanced precision and recall, and significantly outperforming conventional supervised models. The key contribution is the first direct application of large language model zero-shot prompting to fine-grained hate speech detection—achieving high robustness and cross-context generalization without reliance on labeled data or model adaptation.

Technology Category

Application Category

📝 Abstract

The widespread use of social media platforms like Twitter and Facebook has enabled people of all ages to share their thoughts and experiences, leading to an immense accumulation of user-generated content. However, alongside the benefits, these platforms also face the challenge of managing hate speech and offensive content, which can undermine rational discourse and threaten democratic values. As a result, there is a growing need for automated methods to detect and mitigate such content, especially given the complexity of conversations that may require contextual analysis across multiple languages, including code-mixed languages like Hinglish, German-English, and Bangla. We participated in the English task where we have to classify English tweets into two categories namely Hate and Offensive and Non Hate-Offensive. In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify tweets into Hate and Offensive or Non Hate-Offensive. In this study, we evaluate the performance of a classification model using Macro-F1 scores across three distinct runs. The Macro-F1 score, which balances precision and recall across all classes, is used as the primary metric for model evaluation. The scores obtained are 0.756 for run 1, 0.751 for run 2, and 0.754 for run 3, indicating a high level of performance with minimal variance among the runs. The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance. These findings highlight the robustness and reliability of the model across different runs.

Problem

Research questions and friction points this paper is trying to address.

Detecting hate speech and offensive content on social media.

Classifying English tweets into Hate/Offensive or Non-Hate/Offensive categories.

Evaluating GPT-3.5 Turbo's performance in content classification.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes GPT-3.5 Turbo for hate speech detection

Employs prompting for tweet classification tasks

Evaluates model with Macro-F1 scores consistently

🔎 Similar Papers

No similar papers found.