BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large-scale time-series datasets suffer from distributional skewness and uneven pattern coverage, severely limiting the generalization capability of pretrained models. To address this, we propose a balanced sampling strategy for constructing a pretraining corpus tailored to generic time-series forecasting: we introduce implicit clustering via statistical feature-based grid partitioning, coupled with Grid Mixup—a novel pattern-aware sampling and augmentation technique—to ensure equitable coverage of temporal patterns and enhance data diversity. The resulting corpus significantly improves both the diversity and representativeness of time-series patterns. Building upon this, we design a lightweight, efficient large-scale pretraining framework that achieves state-of-the-art (SOTA) performance on zero-shot forecasting tasks—using only ∼1/3 of the computational resources and training tokens required by prior approaches—while substantially enhancing cross-domain generalization and training efficiency.

Technology Category

Application Category

📝 Abstract
The advent of universal time series forecasting models has revolutionized zero-shot forecasting across diverse domains, yet the critical role of data diversity in training these models remains underexplored. Existing large-scale time series datasets often suffer from inherent biases and imbalanced distributions, leading to suboptimal model performance and generalization. To address this gap, we introduce BLAST, a novel pre-training corpus designed to enhance data diversity through a balanced sampling strategy. First, BLAST incorporates 321 billion observations from publicly available datasets and employs a comprehensive suite of statistical metrics to characterize time series patterns. Then, to facilitate pattern-oriented sampling, the data is implicitly clustered using grid-based partitioning. Furthermore, by integrating grid sampling and grid mixup techniques, BLAST ensures a balanced and representative coverage of diverse patterns. Experimental results demonstrate that models pre-trained on BLAST achieve state-of-the-art performance with a fraction of the computational resources and training tokens required by existing methods. Our findings highlight the pivotal role of data diversity in improving both training efficiency and model performance for the universal forecasting task.
Problem

Research questions and friction points this paper is trying to address.

Addressing data diversity gaps in universal time series forecasting models
Mitigating biases and imbalances in large-scale time series datasets
Enhancing model performance through balanced pattern-oriented sampling strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Balanced sampling strategy enhances data diversity
Grid-based partitioning for pattern-oriented sampling
Grid sampling and mixup ensure representative coverage
🔎 Similar Papers
No similar papers found.
Zezhi Shao
Zezhi Shao
Institute of Computing Technology, Chinese Academy of Sciences
Time Series ForecastingSpatial-Temporal Data MiningGraph Data Mining
Y
Yujie Li
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
F
Fei Wang
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
C
Chengqing Yu
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
Y
Yisong Fu
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
Tangwen Qian
Tangwen Qian
Institute of Computing Technology, Chinese Academy of Sciences
B
Bin Xu
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
B
Boyu Diao
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
Y
Yongjun Xu
Institute of Computing Technology, Chinese Academy of Sciences, State Key Laboratory of AI Safety, University of Chinese Academy of Sciences
Xueqi Cheng
Xueqi Cheng
Ph.D. student, Florida State University
Data miningLLMGNNComputational social science