🤖 AI Summary
To address misinformation detection challenges on low-moderation platforms such as Telegram in the context of German elections, this paper proposes a graph neural network method that jointly models propagation network structure and textual semantics. We construct the first graph-structured dataset of German-language Telegram messages, where nodes represent messages and edges encode retweet relationships. To mitigate label scarcity, we introduce M3-embeddings to compute semantic similarity with fact-checking statements, generating weak supervision signals that are jointly optimized with human-annotated strong labels. Our model integrates LSTM for textual representation learning and GraphSAGE for structural propagation modeling. It achieves significant improvements over text-only baselines in Matthews Correlation Coefficient (MCC) and F1-score. Key contributions include: (1) the first publicly available German Telegram graph dataset; (2) a reproducible benchmark combining weak supervision and graph representation learning; and (3) empirical validation that network topology provides critical performance gains for detecting disinformation on under-moderated platforms.
📝 Abstract
Connectivity and message propagation are central, yet often underutilized, sources of information in misinformation detection -- especially on poorly moderated platforms such as Telegram, which has become a critical channel for misinformation dissemination, namely in the German electoral context. In this paper, we introduce Misinfo-TeleGraph, the first German-language Telegram-based graph dataset for misinformation detection. It includes over 5 million messages from public channels, enriched with metadata, channel relationships, and both weak and strong labels. These labels are derived via semantic similarity to fact-checks and news articles using M3-embeddings, as well as manual annotation. To establish reproducible baselines, we evaluate both text-only models and graph neural networks (GNNs) that incorporate message forwarding as a network structure. Our results show that GraphSAGE with LSTM aggregation significantly outperforms text-only baselines in terms of Matthews Correlation Coefficient (MCC) and F1-score. We further evaluate the impact of subscribers, view counts, and automatically versus human-created labels on performance, and highlight both the potential and challenges of weak supervision in this domain. This work provides a reproducible benchmark and open dataset for future research on misinformation detection in German-language Telegram networks and other low-moderation social platforms.