Prediction of Reposting on X

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
On X (formerly Twitter), conventional supervised methods for predicting user retweet behavior suffer severe out-of-distribution (OOD) generalization failure when tested on unseen topics—dropping to F1 = 0.24—due to overreliance on topic-specific content features. Method: We propose a user-centric OOD classification framework that eschews content-based signals and instead leverages intrinsic user attributes (e.g., profile bio, follower count) and historical interaction patterns (e.g., cross-topic retweet frequency, preference stability) as robust, generalizable representations. These features are fed into standard classifiers (XGBoost/MLP). Contribution/Results: Our approach fundamentally decouples prediction from topic-dependent textual cues. In realistic cross-topic OOD evaluation, it achieves F1 = 0.70—a +0.46 absolute gain—demonstrating that stable user behavioral signatures exhibit high cross-topic robustness and transferability. This establishes a novel paradigm for forecasting user engagement under dynamically emerging topics on social platforms.

Technology Category

Application Category

📝 Abstract
There have been considerable efforts to predict a user's reposting behaviour on X (formerly Twitter) using machine learning models. The problem is previously cast as a supervised classification task, where Tweets are randomly assigned to a test or training set. The random assignment helps to ensure that the test and training sets are drawn from the same distribution. In practice, we would like to predict users' reposting behaviour for a set of messages related to a new, previously unseen, topic (defined by a hashtag). In this case, the problem becomes an out-of-distribution generalisation classification task. Experimental results reveal that while existing algorithms, which predominantly use features derived from the content of Tweet messages, perform well when the training and test distributions are the same, these algorithms perform much worse when the test set is out of distribution. We then show that if the message features are supplemented or replaced with features derived from users' profile and past behaviour, the out-of-distribution prediction is greatly improved, with the F1 score increasing from 0.24 to 0.70. Our experimental results suggest that a significant component of reposting behaviour can be predicted based on users' profile and past behaviour, and is independent of the content of messages.
Problem

Research questions and friction points this paper is trying to address.

Predicting user reposting behavior on X for new topics
Improving out-of-distribution generalization in reposting prediction
Enhancing prediction using user profiles and past behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses user profile features for prediction
Incorporates past behavior data
Improves out-of-distribution generalization
🔎 Similar Papers
No similar papers found.
Z
Ziming Xu
Centre for Artificial Intelligence, Department of Computer Science, University College London, UK
S
Shi Zhou
Centre for Artificial Intelligence, Department of Computer Science, University College London, UK
Vasileios Lampos
Vasileios Lampos
University College London
Machine LearningNatural Language ProcessingArtificial IntelligenceDigital Epidemiology
Ingemar J. Cox
Ingemar J. Cox
Department of Computer Science, University College London / University of Copenhagen
digital epidemiologyinformation retrievaldigital watermarkingcomputer visionmultimedia