🤖 AI Summary
This work proposes PARADIS, a novel machine learning weather forecasting model that explicitly embeds physical structure into its architecture to address the challenge of efficiently modeling long-range transport processes such as advection, which are typically encoded implicitly in monolithic networks. By leveraging functional decomposition, PARADIS decouples prediction into distinct advection, diffusion, and reaction modules. It introduces a neural semi-Lagrangian operator to enable trajectory-based, differentiable spherical interpolation for transport. The model learns latent variables and their transport trajectories in an end-to-end manner, achieving superior forecast accuracy at 1° resolution on the ERA5 benchmark with less than one GPU-month of training—outperforming both traditional numerical models like ECMWF HRES at 0.25° resolution and state-of-the-art machine learning baselines such as GraphCast—while significantly reducing computational cost.
📝 Abstract
Recent machine-learning approaches to weather forecasting often employ a monolithic architecture, where distinct physical mechanisms (advection, transport), diffusion-like mixing, thermodynamic processes, and forcing are represented implicitly within a single large network. This representation is particularly problematic for advection, where long-range transport must be treated with expensive global interaction mechanisms or through deep, stacked convolutional layers. To mitigate this, we present PARADIS, a physics-inspired global weather prediction model that imposes inductive biases on network behavior through a functional decomposition into advection, diffusion, and reaction blocks acting on latent variables. We implement advection through a Neural Semi-Lagrangian operator that performs trajectory-based transport via differentiable interpolation on the sphere, enabling end-to-end learning of both the latent modes to be transported and their characteristic trajectories. Diffusion-like processes are modeled through depthwise-separable spatial mixing, while local source terms and vertical interactions are modeled via pointwise channel interactions, enabling operator-level physical structure. PARADIS provides state-of-the-art forecast skill at a fraction of the training cost. On ERA5-based benchmarks, the 1 degree PARADIS model, with a total training cost of less than a GPU month, meets or exceeds the performance of 0.25 degree traditional and machine-learning baselines, including the ECMWF HRES forecast and DeepMind's GraphCast.