BIDWESH: A Bangla Regional Based Hate Speech Detection Dataset

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Bangla hate speech detection methods critically overlook informal expressions and culturally embedded contexts in regional dialects—particularly those spoken in Barisal, Noakhali, and Chittagong—resulting in suboptimal detection performance and biased content moderation. To address this, we introduce BIDWESH, the first multi-dialectal, multidimensional hate speech dataset for Bangla, comprising 9,183 manually annotated instances covering these three dialects. Our methodology innovatively integrates dialect-specific translation (leveraging the BD-SHS corpus) with a fine-grained, multi-label annotation scheme—covering defamation, gender-based, religious, and incitement-to-violence categories—and incorporates rigorous human verification to ensure linguistic accuracy and contextual coherence. BIDWESH substantially enhances model sensitivity to non-standard lexical forms and culturally nuanced hateful content. It fills a critical gap in low-resource dialectal NLP for fair and precise content moderation, establishing the first high-quality benchmark for dialect-aware hate speech detection.

Technology Category

Application Category

📝 Abstract
Hate speech on digital platforms has become a growing concern globally, especially in linguistically diverse countries like Bangladesh, where regional dialects play a major role in everyday communication. Despite progress in hate speech detection for standard Bangla, Existing datasets and systems fail to address the informal and culturally rich expressions found in dialects such as Barishal, Noakhali, and Chittagong. This oversight results in limited detection capability and biased moderation, leaving large sections of harmful content unaccounted for. To address this gap, this study introduces BIDWESH, the first multi-dialectal Bangla hate speech dataset, constructed by translating and annotating 9,183 instances from the BD-SHS corpus into three major regional dialects. Each entry was manually verified and labeled for hate presence, type (slander, gender, religion, call to violence), and target group (individual, male, female, group), ensuring linguistic and contextual accuracy. The resulting dataset provides a linguistically rich, balanced, and inclusive resource for advancing hate speech detection in Bangla. BIDWESH lays the groundwork for the development of dialect-sensitive NLP tools and contributes significantly to equitable and context-aware content moderation in low-resource language settings.
Problem

Research questions and friction points this paper is trying to address.

Detects hate speech in regional Bangla dialects
Addresses gaps in informal dialectal expression datasets
Improves content moderation for low-resource languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

First multi-dialectal Bangla hate speech dataset
Manually verified and annotated 9,183 dialect instances
Enables dialect-sensitive NLP tools development
🔎 Similar Papers
No similar papers found.
A
Azizul Hakim Fayaz
Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh
MD. Shorif Uddin
MD. Shorif Uddin
Department of Computer Science and Engineering, Southeast University
Artificial IntelligenceNatural Language Processing (NLP)Machine Learning (ML)Security
Rayhan Uddin Bhuiyan
Rayhan Uddin Bhuiyan
Southeast University
Artificial IntelligenceNatural Language Processing (NLP)Machine Learning (ML)Deep LearningCo
Z
Zakia Sultana
Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh
M
Md. Samiul Islam
Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh
Bidyarthi Paul
Bidyarthi Paul
Adjunct Lecturer, Southeast University
Natural Language ProcessingLLMGenAi
Tashreef Muhammad
Tashreef Muhammad
Lecturer of Computer Science and Engineering, Southeast University, Bangladesh
Machine LearningMetaheuristicStock Market
S
Shahriar Manzoor
Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh