🤖 AI Summary
Existing wind power forecasting methods face two key bottlenecks: site-specific models suffer from poor generalizability, while generic time-series foundation models struggle to incorporate domain-specific energy priors. To address this, we propose the first dedicated foundation model for zero-shot cross-site wind power forecasting. Our method introduces a hierarchical time-series tokenizer that explicitly captures meteorological–power coupling relationships and adopts a lightweight decoder-only Transformer architecture integrated with a multivariate discretization-based generative framework, pre-trained autoregressively on large-scale wind energy data. With only 8.1 million parameters, our model achieves state-of-the-art performance in both deterministic and probabilistic forecasting tasks. Moreover, it demonstrates strong robustness and zero-shot transfer capability under out-of-distribution settings—particularly across continental-scale site shifts—without fine-tuning.
📝 Abstract
High-quality wind power forecasting is crucial for the operation of modern power grids. However, prevailing data-driven paradigms either train a site-specific model which cannot generalize to other locations or rely on fine-tuning of general-purpose time series foundation models which are difficult to incorporate domain-specific data in the energy sector. This paper introduces WindFM, a lightweight and generative Foundation Model designed specifically for probabilistic wind power forecasting. WindFM employs a discretize-and-generate framework. A specialized time-series tokenizer first converts continuous multivariate observations into discrete, hierarchical tokens. Subsequently, a decoder-only Transformer learns a universal representation of wind generation dynamics by autoregressively pre-training on these token sequences. Using the comprehensive WIND Toolkit dataset comprising approximately 150 billion time steps from more than 126,000 sites, WindFM develops a foundational understanding of the complex interplay between atmospheric conditions and power output. Extensive experiments demonstrate that our compact 8.1M parameter model achieves state-of-the-art zero-shot performance on both deterministic and probabilistic tasks, outperforming specialized models and larger foundation models without any fine-tuning. In particular, WindFM exhibits strong adaptiveness under out-of-distribution data from a different continent, demonstrating the robustness and transferability of its learned representations. Our pre-trained model is publicly available at https://github.com/shiyu-coder/WindFM.