Scaling-laws for Large Time-series Models

📅 2024-05-22
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the scaling relationship between model size and performance for large-scale time series models, establishing—for the first time in time series forecasting—a strong power-law scaling law among parameter count ($N$), dataset size ($D$), and compute budget ($C$): $L propto N^{-alpha} D^{-eta} C^{-gamma}$. Methodologically, we train decoder-only Transformer architectures on a large-scale heterogeneous time series corpus, following standardized scaling experiment protocols and fitting empirical losses via power-law regression across five orders of magnitude. Key results show that the scaling law is highly robust, exhibiting minimal sensitivity to architectural details (e.g., width-to-depth ratio, number of attention heads), and that prediction error can be accurately predicted jointly from $N$, $D$, and $C$. This study provides the first quantifiable, reproducible engineering framework to guide the design, training, and resource allocation of time-series foundation models.

Technology Category

Application Category

📝 Abstract
Scaling laws for large language models (LLMs) have provided useful guidance in training ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, with architectural details (aspect ratio and number of heads) having a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish for the first time power-law scaling with parameter count, dataset size, and training compute, spanning five orders of magnitude.
Problem

Research questions and friction points this paper is trying to address.

Time Series Analysis
Model Scaling
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time Series Prediction
Model Scaling Laws
Performance Enhancement
🔎 Similar Papers
No similar papers found.
T
Thomas D. P. Edwards
Johns Hopkins University
J
James Alvey
University of Cambridge, University of Amsterdam
Justin Alsing
Justin Alsing
Stockholm University, Calda AI
Nam H. Nguyen
Nam H. Nguyen
Capital One
Foundation modelsdeep learningmachine learningtime series
B
Benjamin D. Wandelt
Institut d’Astrophysique de Paris, CCA, Flatiron Institute