GS_DravidianLangTech@2025: Women Targeted Abusive Texts Detection on Social Media

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study addresses the detection of female-targeted abusive text—including hate speech, derogatory language, and threats—in Tamil and Malayalam social media, marking the first systematic, gender-sensitive abusive language identification effort for low-resource Dravidian languages. We propose a two-stage approach integrating logistic regression with fine-tuned multilingual BERT, trained and evaluated cross-lingually on the DravidianLangTech@2025 annotated dataset. Experimental results show that the BERT-based model achieves macro-F1 scores of 0.729 on the Tamil test set and 0.628 on the Malayalam test set—substantially outperforming baseline methods. This work fills a critical research gap in content safety for South Indian languages, specifically in detecting gendered abuse. It provides a reproducible methodological framework and benchmark results for gender-inclusive NLP in low-resource settings.

Technology Category

Application Category

📝 Abstract

The increasing misuse of social media has become a concern; however, technological solutions are being developed to moderate its content effectively. This paper focuses on detecting abusive texts targeting women on social media platforms. Abusive speech refers to communication intended to harm or incite hatred against vulnerable individuals or groups. Specifically, this study aims to identify abusive language directed toward women. To achieve this, we utilized logistic regression and BERT as base models to train datasets sourced from DravidianLangTech@2025 for Tamil and Malayalam languages. The models were evaluated on test datasets, resulting in a 0.729 macro F1 score for BERT and 0.6279 for logistic regression in Tamil and Malayalam, respectively.

Problem

Research questions and friction points this paper is trying to address.

Detect abusive texts targeting women on social media

Identify harmful language in Tamil and Malayalam

Evaluate models for abusive speech detection performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used logistic regression for abusive text detection

Employed BERT model for enhanced accuracy

Trained on DravidianLangTech@2025 Tamil and Malayalam datasets

🔎 Similar Papers

No similar papers found.