🤖 AI Summary
Existing traffic forecasting methods model only routine patterns and thus struggle to anticipate sudden traffic surges triggered by societal events (e.g., celebrity deaths, major sporting events), risking network overload. This paper proposes the first socio-technical forecasting paradigm that jointly leverages social signals and network measurements: web crawlers automatically collect textual data from social media, news outlets, and online forums; large language models perform event detection and semantic clustering to infer high-impact societal events; and a cross-domain correlation model links these inferred events to real-time traffic measurements from Internet Exchange Points (IXPs). Our key contribution is the novel use of large-scale, publicly available online discourse as a *semantic precursor* for traffic prediction. Evaluated on medium-scale datasets, the system achieves 56%–92% accuracy in forecasting socially driven traffic peaks—substantially improving foresight into non-routine anomalies.
📝 Abstract
Societal events shape the Internet's behavior. The death of a prominent public figure, a software launch, or a major sports match can trigger sudden demand surges that overwhelm peering points and content delivery networks. Although these events fall outside regular traffic patterns, forecasting systems still rely solely on those patterns and therefore miss these critical anomalies.
Thus, we argue for socio-technical systems that supplement technical measurements with an active understanding of the underlying drivers, including how events and collective behavior shape digital demands. We propose traffic forecasting using signals from public discourse, such as headlines, forums, and social media, as early demand indicators.
To validate our intuition, we present a proof-of-concept system that autonomously scrapes online discussions, infers real-world events, clusters and enriches them semantically, and correlates them with traffic measurements at a major Internet Exchange Point. This prototype predicted between 56-92% of society-driven traffic spikes after scraping a moderate amount of online discussions.
We believe this approach opens new research opportunities in cross-domain forecasting, scheduling, demand anticipation, and society-informed decision making.