🤖 AI Summary
To address the low accuracy and high false-alarm rate of long-horizon (48–120 hr) PM₂.₅ concentration forecasting over East Asia’s complex terrain and strong dynamic regimes—hindering public health early warning—this paper constructs a high-resolution CMAQ-OBS fused dataset and proposes a Group-Relative Policy Optimization (GRPO) reinforcement learning framework. GRPO incorporates a class-aware reward function and a curriculum-based rollout mechanism to explicitly model the asymmetric costs of false alarms (eroding public trust) and missed detections (endangering health), while dynamically calibrating predictions by integrating physics-informed model priors with regional observational data. Compared to supervised fine-tuning baselines, GRPO reduces false-alarm rate by 47.3%, decreases regional RMSE by 59.5%, and maintains competitive F1 scores. The method significantly enhances the reliability and operational utility of long-range air quality forecasting systems for public health decision-making.
📝 Abstract
Accurate long horizon forecasting of particulate matter (PM) concentration fields is essential for operational public health decisions. However, achieving reliable forecasts remains challenging in regions with complex terrain and strong atmospheric dynamics such as East Asia. While foundation models such as Aurora offer global generality, they often miss region-specific dynamics and rely on non-real-time inputs, limiting their practical utility for localized warning systems. To address this gap, we construct and release the real-world observations and high-resolution CMAQ-OBS dataset for East Asia, reducing regional error by 59.5% and enabling real-time 48-120 hour forecasts critical for public health alerts. However, standard point-wise objectives cannot reflect asymmetric operational costs, where false alarms deteriorate public trust while missed severe events endanger populations. This cost mismatch causes SFT models to over-predict and yield high False Alarm Rates. We introduce Group-Relative Policy Optimization (GRPO) with class-wise rewards and curriculum rollout to align predictions with operational priorities. Experimental results demonstrate that our framework significantly improves the reliability of the forecast. Compared to the SFT-only baseline, our model reduces the False Alarm Rate by 47.3% while achieving a competitive F1-score, proving its effectiveness for practical, real-world air quality forecasting systems on long lead time scenarios.