🤖 AI Summary
This work addresses the scarcity of millisecond-level training data for time series foundation models, which hinders their generalization to real-world high-frequency tasks. We introduce a novel millisecond-resolution time series dataset derived from actual 5G wireless network deployments, capturing dynamic wireless states and traffic patterns, thereby establishing the first realistic benchmark for high-frequency pretraining. Through zero-shot and fine-tuning evaluations, we systematically assess prominent time series foundation models on this dataset, revealing significant performance bottlenecks under high-frequency data distributions. Our study fills a critical gap in high-frequency pretraining resources and underscores the essential role of such data in enhancing model generalization and robustness in temporally demanding scenarios.
📝 Abstract
Time series foundation models (TSFMs) require diverse, real-world datasets to adapt across varying domains and temporal frequencies. However, current large-scale datasets predominantly focus on low-frequency time series with sampling intervals, i.e., time resolution, in the range of seconds to years, hindering their ability to capture the nuances of high-frequency time series data. To address this limitation, we introduce a novel dataset that captures millisecond-resolution wireless and traffic conditions from an operational 5G wireless deployment, expanding the scope of TSFMs to incorporate high-frequency data for pre-training. Further, the dataset introduces a new domain, wireless networks, thus complementing existing more general domains like energy and finance. The dataset also provides use cases for short-term forecasting, with prediction horizons spanning from 100 milliseconds (1 step) to 9.6 seconds (96 steps). By benchmarking traditional machine learning models and TSFMs on predictive tasks using this dataset, we demonstrate that most TSFM model configurations perform poorly on this new data distribution in both zero-shot and fine-tuned settings. Our work underscores the importance of incorporating high-frequency datasets during pre-training and forecasting to enhance architectures, fine-tuning strategies, generalization, and robustness of TSFMs in real-world applications.