🤖 AI Summary
This study addresses the challenges of topic diversity and user behavioral heterogeneity in Telegram’s public groups. We develop the first open-source, automated data collection framework—built on Telethon—to harvest over 50 million multilingual messages from 669 public groups spanning education, politics, cryptocurrency, and adult content. We propose a fine-grained topical analysis paradigm, systematically uncovering differential propagation patterns of videos and stickers across topics for the first time. Our analysis identifies illicit content distribution pathways and counterintuitive social behaviors—e.g., significantly lower bot prevalence in adult groups than in political ones—and quantifies statistically significant cross-topic variations in linguistic diversity, bot activity, and multimedia usage intensity. The contributions include an open dataset, an analytical toolkit, and a fully reproducible methodology—providing empirical foundations and technical infrastructure for platform governance and cross-topic social behavior research.
📝 Abstract
Although currently one of the most popular instant messaging apps worldwide, Telegram has been largely understudied in the past years. In this paper, we aim to address this gap by presenting an analysis of publicly accessible groups covering discussions encompassing different topics, as diverse as Education, Erotic, Politics, and Cryptocurrencies. We engineer and offer an open-source tool to automate the collection of messages from Telegram groups, a non-straightforward problem. We use it to collect more than 50 million messages from 669 groups. Here, we present a first-of-its-kind, per-topic analysis, contrasting the characteristics of the messages sent on the platform from different angles -- the language, the presence of bots, the type and volume of shared media content. Our results confirm some anecdotal evidence, e.g., clues that Telegram is used to share possibly illicit content, and unveil some unexpected findings, e.g., the different sharing patterns of video and stickers in groups of different topics. While preliminary, we hope that our work paves the road for several avenues of future research on the understudied Telegram platform.