Scaling-laws for Large Time-series Models

📅 2024-05-22

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work investigates the scaling relationship between model size and performance for large-scale time series models, establishing—for the first time in time series forecasting—a strong power-law scaling law among parameter count ($N$), dataset size ($D$), and compute budget ($C$): $L propto N^{-alpha} D^{-eta} C^{-gamma}$. Methodologically, we train decoder-only Transformer architectures on a large-scale heterogeneous time series corpus, following standardized scaling experiment protocols and fitting empirical losses via power-law regression across five orders of magnitude. Key results show that the scaling law is highly robust, exhibiting minimal sensitivity to architectural details (e.g., width-to-depth ratio, number of attention heads), and that prediction error can be accurately predicted jointly from $N$, $D$, and $C$. This study provides the first quantifiable, reproducible engineering framework to guide the design, training, and resource allocation of time-series foundation models.

Technology Category

Application Category

📝 Abstract

Scaling laws for large language models (LLMs) have provided useful guidance in training ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, with architectural details (aspect ratio and number of heads) having a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish for the first time power-law scaling with parameter count, dataset size, and training compute, spanning five orders of magnitude.

Problem

Research questions and friction points this paper is trying to address.

Time Series Analysis

Model Scaling

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Time Series Prediction

Model Scaling Laws

Performance Enhancement

🔎 Similar Papers

No similar papers found.