Scaling Laws of Global Weather Models

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the scaling laws of data-driven weather forecasting models, elucidating how model size, training data volume, and computational budget jointly influence predictive performance. Through large-scale empirical analysis comparing state-of-the-art models such as Aurora and GraphCast under diverse configurations, the work reveals that weather models exhibit scaling behaviors markedly distinct from those of language models: performance gains are more sensitive to increasing network width than depth, and extended training time yields greater improvements than simply scaling up model parameters. Experiments demonstrate that Aurora achieves the highest data-scaling efficiency—yielding a 3.2× reduction in loss with a 10× increase in data—while GraphCast exhibits superior parameter efficiency. The study further proposes optimized strategies for allocating computational resources, offering both theoretical insights and practical guidance for efficient weather modeling.

Technology Category

Application Category

📝 Abstract
Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.
Problem

Research questions and friction points this paper is trying to address.

scaling laws
weather forecasting
data-driven models
model performance
compute budget
Innovation

Methods, ideas, or system contributions that make the work stand out.

scaling laws
weather forecasting
data-driven models
model width vs depth
compute-optimal training
Y
Yuejiang Yu
Department of Computer Science, ETH Zurich, Zurich, Switzerland
L
Langwen Huang
Department of Computer Science, ETH Zurich, Zurich, Switzerland
A
Alexandru Calotoiu
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Torsten Hoefler
Torsten Hoefler
Professor of Computer Science at ETH Zurich
High Performance ComputingDeep LearningNetworkingMessage Passing InterfaceParallel and Distributed Computing