π€ AI Summary
This study addresses the challenge of efficiently integrating multi-scale meteorological and pollution data for ultra-large-scale spatiotemporal PM2.5 prediction at 1 km resolution. To this end, the authors propose CRAN-PM, a dual-branch Vision Transformer featuring a novel cross-resolution attention architecture. By incorporating elevation-aware self-attention and wind-direction-guided cross-attention, the model learns physically consistent feature representations without explicitly ingesting physical variables. Evaluated on a 362-day European prediction task in 2022, CRAN-PM reduces RMSE by 4.7% and 10.7% for T+1 and T+3 forecasts, respectively, and decreases bias in complex terrain by 36%. Notably, it generates high-resolution prediction maps covering 29 million pixels in just 1.8 seconds on a single GPU.
π Abstract
Vision Transformers have achieved remarkable success in spatio-temporal prediction, but their scalability remains limited for ultra-high-resolution, continent-scale domains required in real-world environmental monitoring. A single European air-quality map at 1 km resolution comprises 29 million pixels, far beyond the limits of naive self-attention. We introduce CRAN-PM, a dual-branch Vision Transformer that leverages cross-resolution attention to efficiently fuse global meteorological data (25 km) with local high-resolution PM2.5 at the current time (1 km). Instead of including physically driven factors like temperature and topography as input, we further introduce elevation-aware self-attention and wind-guided cross-attention to force the network to learn physically consistent feature representations for PM2.5 forecasting. CRAN-PM is fully trainable and memory-efficient, generating the complete 29-million-pixel European map in 1.8 seconds on a single GPU. Evaluated on daily PM2.5 forecasting throughout Europe in 2022 (362 days, 2,971 European Environment Agency (EEA) stations), it reduces RMSE by 4.7% at T+1 and 10.7% at T+3 compared to the best single-scale baseline, while reducing bias in complex terrain by 36%.