🤖 AI Summary
To address the high computational cost of numerical models, weak physical interpretability of machine learning approaches, and insufficient fusion of heterogeneous data in urban air quality forecasting, this paper proposes a mechanism-guided tri-modal framework integrating emissions, meteorology, and pollutants. The framework unifies multi-source heterogeneous data alignment and embedding, mechanism-constrained graph neural networks, lightweight spatiotemporal autoregressive modeling, and dynamic frame interpolation compensation to enable 72-hour hourly forecasts across six key pollutants. Its synergistic autoregressive + frame interpolation strategy ensures both physical consistency and sub-second inference latency (25–30 seconds per prediction). Evaluated on real-world megacity datasets, the method reduces mean absolute error (MAE) by 18.7% on average compared to state-of-the-art numerical models, demonstrating strong generalization. It delivers high-accuracy, low-latency decision support for smart-city low-carbon governance.
📝 Abstract
Air pollution has emerged as a major public health challenge in megacities. Numerical simulations and single-site machine learning approaches have been widely applied in air quality forecasting tasks. However, these methods face multiple limitations, including high computational costs, low operational efficiency, and limited integration with observational data. With the rapid advancement of artificial intelligence, there is an urgent need to develop a low-cost, efficient air quality forecasting model for smart urban management. An air quality forecasting model, named FuXi-Air, has been constructed in this study based on multimodal data fusion to support high-precision air quality forecasting and operated in typical megacities. The model integrates meteorological forecasts, emission inventories, and pollutant monitoring data under the guidance of air pollution mechanism. By combining an autoregressive prediction framework with a frame interpolation strategy, the model successfully completes 72-hour forecasts for six major air pollutants at an hourly resolution across multiple monitoring sites within 25-30 seconds. In terms of both computational efficiency and forecasting accuracy, it outperforms the mainstream numerical air quality models in operational forecasting work. Ablation experiments concerning key influencing factors show that although meteorological data contribute more to model accuracy than emission inventories do, the integration of multimodal data significantly improves forecasting precision and ensures that reliable predictions are obtained under differing pollution mechanisms across megacities. This study provides both a technical reference and a practical example for applying multimodal data-driven models to air quality forecasting and offers new insights into building hybrid forecasting systems to support air pollution risk warning in smart city management.