RainDiff: End-to-end Precipitation Nowcasting Via Token-wise Attention Diffusion

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Precipitation nowcasting is highly challenging due to atmospheric chaos and strong spatiotemporal coupling. Existing diffusion models face scalability bottlenecks: latent-space approaches rely on auxiliary autoencoders, compromising generalization; pixel-space methods incur high computational costs and lack attention mechanisms, hindering long-range spatiotemporal dependency modeling. To address these limitations, we propose the Token-level Attention Diffusion model (TAD), which natively integrates lightweight tokenized attention into both the U-Net backbone and spatiotemporal encoders for end-to-end radar echo sequence prediction—eliminating the need for pre-trained autoencoders. This design simultaneously captures multi-scale spatial interactions and dynamic temporal evolution under low computational overhead. Experiments demonstrate that TAD significantly outperforms state-of-the-art methods across multiple benchmarks, notably improving local detail fidelity, cross-domain generalization, and robustness in complex weather scenarios.

Technology Category

Application Category

📝 Abstract
Precipitation nowcasting, predicting future radar echo sequences from current observations, is a critical yet challenging task due to the inherently chaotic and tightly coupled spatio-temporal dynamics of the atmosphere. While recent advances in diffusion-based models attempt to capture both large-scale motion and fine-grained stochastic variability, they often suffer from scalability issues: latent-space approaches require a separately trained autoencoder, adding complexity and limiting generalization, while pixel-space approaches are computationally intensive and often omit attention mechanisms, reducing their ability to model long-range spatio-temporal dependencies. To address these limitations, we propose a Token-wise Attention integrated into not only the U-Net diffusion model but also the spatio-temporal encoder that dynamically captures multi-scale spatial interactions and temporal evolution. Unlike prior approaches, our method natively integrates attention into the architecture without incurring the high resource cost typical of pixel-space diffusion, thereby eliminating the need for separate latent modules. Our extensive experiments and visual evaluations across diverse datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches, yielding superior local fidelity, generalization, and robustness in complex precipitation forecasting scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addresses scalability issues in diffusion-based precipitation nowcasting models
Integrates token-wise attention to capture multi-scale spatio-temporal dependencies
Eliminates separate latent modules while maintaining computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-wise Attention integrated into U-Net diffusion model
Dynamically captures multi-scale spatial-temporal interactions
Eliminates separate latent modules for enhanced generalization
🔎 Similar Papers
No similar papers found.