🤖 AI Summary
Research on transnational political discourse in Bangla has long been hindered by the scarcity of high-quality annotated data.
Method: This paper introduces the first cross-platform, community-driven, multilingual annotated dataset for political discourse in Bangla. We propose a community-informed keyword retrieval strategy and conduct multi-stage human annotation—covering fine-grained linguistic variants (Bangla, English, and Romanized Bangla)—across three structurally heterogeneous platforms: Reddit, Facebook, and Telegram.
Contribution/Results: The dataset comprises tens of thousands of posts centered on core political themes—including political identity, immigration policy, and nationalism. It is the first resource to enable systematic, cross-platform comparative analysis of political discourse in a low-resource language, thereby filling a critical gap in Bangla political corpora. Designed for reproducibility and scalability, it provides a foundational resource for multilingual social computing and transnational political communication research.
📝 Abstract
Understanding political discourse in online spaces is crucial for analyzing public opinion and ideological polarization. While social computing and computational linguistics have explored such discussions in English, such research efforts are significantly limited in major yet under-resourced languages like Bengali due to the unavailability of datasets. In this paper, we present a multilingual dataset of Bengali transnational political discourse (BTPD) collected from three online platforms, each representing distinct community structures and interaction dynamics. Besides describing how we hand-curated the dataset through community-informed keyword-based retrieval, this paper also provides a general overview of its topics and multilingual content.