BTPD: A Multilingual Hand-curated Dataset of Bengali Transnational Political Discourse Across Online Communities

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Research on transnational political discourse in Bangla has long been hindered by the scarcity of high-quality annotated data. Method: This paper introduces the first cross-platform, community-driven, multilingual annotated dataset for political discourse in Bangla. We propose a community-informed keyword retrieval strategy and conduct multi-stage human annotation—covering fine-grained linguistic variants (Bangla, English, and Romanized Bangla)—across three structurally heterogeneous platforms: Reddit, Facebook, and Telegram. Contribution/Results: The dataset comprises tens of thousands of posts centered on core political themes—including political identity, immigration policy, and nationalism. It is the first resource to enable systematic, cross-platform comparative analysis of political discourse in a low-resource language, thereby filling a critical gap in Bangla political corpora. Designed for reproducibility and scalability, it provides a foundational resource for multilingual social computing and transnational political communication research.

Technology Category

Application Category

📝 Abstract
Understanding political discourse in online spaces is crucial for analyzing public opinion and ideological polarization. While social computing and computational linguistics have explored such discussions in English, such research efforts are significantly limited in major yet under-resourced languages like Bengali due to the unavailability of datasets. In this paper, we present a multilingual dataset of Bengali transnational political discourse (BTPD) collected from three online platforms, each representing distinct community structures and interaction dynamics. Besides describing how we hand-curated the dataset through community-informed keyword-based retrieval, this paper also provides a general overview of its topics and multilingual content.
Problem

Research questions and friction points this paper is trying to address.

Lack of Bengali political discourse datasets for analysis
Limited research on under-resourced languages like Bengali
Need for multilingual dataset from diverse online platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hand-curated multilingual Bengali political dataset
Community-informed keyword-based data retrieval
Covers diverse online platforms and topics
🔎 Similar Papers
No similar papers found.