The Tag is the Signal: URL-Agnostic Credibility Scoring for Messages on Telegram

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the limitations of existing approaches that rely on URL or lexical features and perform poorly on high-risk, short-text messages from Telegram lacking hyperlinks. The authors propose TAG2CRED, a novel framework introducing a semantic tag–based credibility assessment mechanism. By fine-tuning a large language model to annotate messages along dimensions such as topic, claim type, call-to-action, and evidence, TAG2CRED maps these semantic tags to a risk score in the [0,1] interval without requiring URLs. Integrating L2-regularized logistic regression, a carefully designed tagging schema, and a domain-separated evaluation strategy, TAG2CRED achieves a ROC-AUC of 0.871 and macro-F1 of 0.787 on a dataset of 87,936 Telegram messages. Further incorporating TF-IDF and SBERT representations improves performance to ROC-AUC 0.901 and macro-F1 0.813, significantly enhancing generalization on sparse, link-free short texts.

Technology Category

Application Category

📝 Abstract

Telegram has become one of the leading platforms for disseminating misinformational messages. However, many existing pipelines still classify each message's credibility based on the reputation of its associated domain names or its lexical features. Such methods work well on traditional long-form news articles published by well-known sources, but high-risk posts on Telegram are short and URL-sparse, leading to failures for link-based and standard TF-IDF models. To this end, we propose the TAG2CRED pipeline, a method designed for such short, convoluted messages. Our model will directly score each post based on the tags assigned to the text. We designed a concise label system that covers the dimensions of theme, claim type, call to action, and evidence. The fine-tuned large language model (LLM) assigns tags to messages and then maps these tags to calibrated risk scores in the [0,1] interval through L2-regularized logistic regression. We evaluated 87,936 Telegram messages associated with Media Bias/Fact Check (MBFC), using URL masking and domain disjoint splits. The results showed that the ROC-AUC of the TAG2CRED model reached 0.871, the macro-F1 value was 0.787, and the Brier score was 0.167, outperforming the baseline TF-IDF (macro-F1 value 0.737, Brier score 0.248); at the same time, the number of features used in this model is much smaller, and the generalization ability on infrequent domains is stronger. The performance of the stacked ensemble model (TF-IDF + TAG2CRED + SBERT) was further improved over the baseline SBERT. ROC-AUC reached 0.901, and the macro-F1 value was 0.813 (Brier score 0.114). This indicates that style labels and lexical features may capture different but complementary dimensions of information risk.

Problem

Research questions and friction points this paper is trying to address.

misinformation

credibility scoring

short messages

URL-sparse

Innovation

Methods, ideas, or system contributions that make the work stand out.

TAG2CRED

credibility scoring

Telegram misinformation

LLM-based tagging