🤖 AI Summary
Addressing the longstanding trade-off among accuracy, efficiency, and model size in probabilistic time series forecasting, this paper introduces Moirai 2.0—the first decoder-only foundation model pretrained on 36 million real-world time series. It innovatively adopts single-segment input, quantile tokenization for output representation, and a recursive multi-quantile decoding mechanism—eliminating masked encoders and hybrid distribution modeling while enabling end-to-end optimization via quantile loss. Compared to prior foundation models, Moirai 2.0 reduces parameter count by 30× and accelerates inference by 2×, yet achieves state-of-the-art performance across benchmarks including Gift-Eval. Extensive experiments demonstrate strong generalization and robustness across diverse domains—energy, finance, and IoT—validating, for the first time, the superiority of lightweight decoder-only architectures in large-scale time series foundation modeling.
📝 Abstract
We introduce Moirai 2.0, a decoder-only time-series foundation model trained on a new corpus of 36M series. The model adopts quantile forecasting and multi-token prediction, improving both probabilistic accuracy and inference efficiency. On the Gift-Eval benchmark, it ranks among the top pretrained models while achieving a strong trade-off between accuracy, speed, and model size. Compared to Moirai 1.0, Moirai 2.0 replaces masked-encoder training, multi-patch inputs, and mixture-distribution outputs with a simpler decoder-only architecture, single patch, and quantile loss. Ablation studies isolate these changes -- showing that the decoder-only backbone along with recursive multi-quantile decoding contribute most to the gains. Additional experiments show that Moirai 2.0 outperforms larger models from the same family and exhibits robust domain-level results. In terms of efficiency and model size, Moirai 2.0 is twice as fast and thirty times smaller than its prior best version, Moirai 1.0-Large, while also performing better. Model performance plateaus with increasing parameter count and declines at longer horizons, motivating future work on data scaling and long-horizon modeling. We release code and evaluation details to support further research.