🤖 AI Summary
This work addresses the lack of established scaling laws and extensible foundation models in time series forecasting by proposing a unified training framework that yields a family of foundation models ranging from 4M to 2.5B parameters. Through a consistent architecture, large-scale data, standardized training protocols, and an innovative u-muP hyperparameter transfer method, the study provides the first systematic empirical validation that model performance in time series forecasting consistently improves with scale. The resulting models achieve state-of-the-art results across three major benchmarks—BOOM, GIFT-Eval, and TIME—and five checkpoints are released under the Apache 2.0 license to support further research and reproducibility.
📝 Abstract
We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.