Can LLMs Forecast Internet Traffic from Social Media?

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing traffic forecasting methods model only routine patterns and thus struggle to anticipate sudden traffic surges triggered by societal events (e.g., celebrity deaths, major sporting events), risking network overload. This paper proposes the first socio-technical forecasting paradigm that jointly leverages social signals and network measurements: web crawlers automatically collect textual data from social media, news outlets, and online forums; large language models perform event detection and semantic clustering to infer high-impact societal events; and a cross-domain correlation model links these inferred events to real-time traffic measurements from Internet Exchange Points (IXPs). Our key contribution is the novel use of large-scale, publicly available online discourse as a *semantic precursor* for traffic prediction. Evaluated on medium-scale datasets, the system achieves 56%–92% accuracy in forecasting socially driven traffic peaks—substantially improving foresight into non-routine anomalies.

Technology Category

Application Category

📝 Abstract
Societal events shape the Internet's behavior. The death of a prominent public figure, a software launch, or a major sports match can trigger sudden demand surges that overwhelm peering points and content delivery networks. Although these events fall outside regular traffic patterns, forecasting systems still rely solely on those patterns and therefore miss these critical anomalies. Thus, we argue for socio-technical systems that supplement technical measurements with an active understanding of the underlying drivers, including how events and collective behavior shape digital demands. We propose traffic forecasting using signals from public discourse, such as headlines, forums, and social media, as early demand indicators. To validate our intuition, we present a proof-of-concept system that autonomously scrapes online discussions, infers real-world events, clusters and enriches them semantically, and correlates them with traffic measurements at a major Internet Exchange Point. This prototype predicted between 56-92% of society-driven traffic spikes after scraping a moderate amount of online discussions. We believe this approach opens new research opportunities in cross-domain forecasting, scheduling, demand anticipation, and society-informed decision making.
Problem

Research questions and friction points this paper is trying to address.

Predicting internet traffic anomalies caused by societal events
Supplementing technical measurements with social media signals
Detecting society-driven traffic spikes using online discussions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to scrape online discussions for events
Correlates social media signals with traffic measurements
Predicts society-driven traffic spikes from public discourse
🔎 Similar Papers
No similar papers found.
J
Jonatan Langlet
KTH Royal Institute of Technology & Digital Futures, Stockholm, Sweden
Mariano Scazzariello
Mariano Scazzariello
Senior Researcher at RISE Research Institutes of Sweden
High-Speed NetworkingProgrammable NetworksNetworking for AILarge Language Models
F
Flavio Luciani
Namex, Rome, Italy
M
Marta Burocchi
Namex, Rome, Italy
D
Dejan Kostić
KTH Royal Institute of Technology, Stockholm, Sweden
Marco Chiesa
Marco Chiesa
KTH Royal Institute of Technology
Networked systemsalgorithms