🤖 AI Summary
In Internet traffic matrix (TM) forecasting, a single global model struggles to capture the temporal heterogeneity across diverse flows, leading to suboptimal accuracy. To address this, we propose a time-series clustering–based grouped forecasting framework: we introduce two novel clustering strategies—source clustering (based on topological similarity among traffic sources) and histogram clustering (based on temporal distributional shape)—to partition the TM into behaviorally homogeneous subgroups, each modeled by a dedicated deep learning architecture. This enhances localized temporal pattern modeling and generalization. Evaluated on the Abilene and GÉANT datasets, our method reduces RMSE by 92% and 75%, respectively, and decreases maximum link utilization deviation in routing optimization by 18% and 21%, significantly outperforming state-of-the-art approaches. Our core contribution is the first integration of source-aware and distribution-aware clustering into TM forecasting, enabling fine-grained, interpretable, and subgroup-specific modeling.
📝 Abstract
We present a novel framework that leverages time series clustering to improve internet traffic matrix (TM) prediction using deep learning (DL) models. Traffic flows within a TM often exhibit diverse temporal behaviors, which can hinder prediction accuracy when training a single model across all flows. To address this, we propose two clustering strategies, source clustering and histogram clustering, that group flows with similar temporal patterns prior to model training. Clustering creates more homogeneous data subsets, enabling models to capture underlying patterns more effectively and generalize better than global prediction approaches that fit a single model to the entire TM. Compared to existing TM prediction methods, our method reduces RMSE by up to 92% for Abilene and 75% for GÉANT. In routing scenarios, our clustered predictions also reduce maximum link utilization (MLU) bias by 18% and 21%, respectively, demonstrating the practical benefits of clustering when TMs are used for network optimization.