🤖 AI Summary
Autoregressive weather forecasting often suffers from large-scale structural distortions and physical inconsistencies due to error accumulation across single-step predictions. To address this, this work proposes a plug-and-play prior injection paradigm during decoding, introducing for the first time a state-adaptive physical prior generation mechanism grounded in multi-agent and multimodal large language models (MLLMs). The approach further incorporates region-aware multiscale tokenization and cross-modal regional interaction strategies within the decoder, enabling controllable and reusable prior guidance without modifying the backbone architecture. Experiments on WeatherBench demonstrate consistent improvements in 6-hour forecast accuracy across multiple resolutions and backbone models. Moreover, under strict causal autoregressive inference over 48 hours, the method significantly mitigates early error propagation, thereby enhancing long-term structural consistency and stability.
📝 Abstract
Accurate weather forecasting is more than grid-wise regression: it must preserve coherent synoptic structures and physical consistency of meteorological fields, especially under autoregressive rollouts where small one-step errors can amplify into structural bias. Existing physics-priors approaches typically impose global, once-for-all constraints via architectures, regularization, or NWP coupling, offering limited state-adaptive and sample-specific controllability at deployment. To bridge this gap, we propose Agent-Guided Cross-modal Decoding (AGCD), a plug-and-play decoding-time prior-injection paradigm that derives state-conditioned physics-priors from the current multivariate atmosphere and injects them into forecasters in a controllable and reusable way. Specifically, We design a multi-agent meteorological narration pipeline to generate state-conditioned physics-priors, utilizing MLLMs to extract various meteorological elements effectively. To effectively apply the priors, AGCD further introduce cross-modal region interaction decoding that performs region-aware multi-scale tokenization and efficient physics-priors injection to refine visual features without changing the backbone interface. Experiments on WeatherBench demonstrate consistent gains for 6-hour forecasting across two resolutions (5.625 degree and 1.40625 degree) and diverse backbones (generic and weather-specialized), including strictly causal 48-hour autoregressive rollouts that reduce early-stage error accumulation and improve long-horizon stability.