🤖 AI Summary
Short-term precipitation nowcasting demands both high accuracy and real-time inference, yet existing autoregressive models suffer from inductive bias and slow inference, while diffusion models incur prohibitive computational overhead. To address this, we propose BlockGPT—a novel frame-level autoregressive video forecasting framework. It introduces batched tokenization to discretize 2D precipitation fields into tokens and decouples spatiotemporal modeling: intra-frame self-attention captures spatial structure, while inter-frame causal attention ensures temporal consistency. This design is architecture-agnostic and compatible with Transformer backbones. Evaluated on KNMI and SEVIR benchmarks, BlockGPT achieves state-of-the-art performance—improving event localization (F1 score +8.2%) and classification (AUC +5.6%). Crucially, it attains 31× faster single-step inference than baseline models, meeting operational real-time requirements.
📝 Abstract
Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.