Scaling Laws of Global Weather Models

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study systematically investigates the scaling laws of data-driven weather forecasting models, elucidating how model size, training data volume, and computational budget jointly influence predictive performance. Through large-scale empirical analysis comparing state-of-the-art models such as Aurora and GraphCast under diverse configurations, the work reveals that weather models exhibit scaling behaviors markedly distinct from those of language models: performance gains are more sensitive to increasing network width than depth, and extended training time yields greater improvements than simply scaling up model parameters. Experiments demonstrate that Aurora achieves the highest data-scaling efficiency—yielding a 3.2× reduction in loss with a 10× increase in data—while GraphCast exhibits superior parameter efficiency. The study further proposes optimized strategies for allocating computational resources, offering both theoretical insights and practical guidance for efficient weather modeling.

Technology Category

Application Category

📝 Abstract

Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.

Problem

Research questions and friction points this paper is trying to address.

scaling laws

weather forecasting

data-driven models

model performance

compute budget

Innovation

Methods, ideas, or system contributions that make the work stand out.

scaling laws

weather forecasting

data-driven models

model width vs depth

compute-optimal training

🔎 Similar Papers

Scaling-laws for Large Time-series Models

2024-05-22arXiv.orgCitations: 3

Fast, Scale-Adaptive, and Uncertainty-Aware Downscaling of Earth System Model Fields with Generative Machine Learning

2024-03-05Citations: 0

Authors to Follow