🤖 AI Summary
Traditional Gaussian models underestimate PM₂.₅ extreme pollution events, leading to biased risk assessments. To address the challenge of modeling threshold-exceedance events in Greater London, this paper proposes a Bayesian hierarchical data fusion framework that integrates ground-based observations from the UK’s Automatic Urban and Rural Network (AURN) with satellite-derived reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) EAC4. Methodologically, we innovatively introduce a Dirac-delta–augmented generalized Pareto distribution grounded in extreme value theory to jointly characterize exceedance behavior both above and below the threshold. Crucially, our framework fully quantifies parameter uncertainty within the fusion process and coherently handles multi-scale, spatiotemporally heterogeneous data. Experiments demonstrate that our approach significantly improves threshold-exceedance prediction accuracy over both Gaussian benchmarks and standalone remote-sensing data. It successfully recovers high-concentration coastal spatial patterns missed by remote sensing alone and more accurately captures PM₂.₅ variability and spatial structure.
📝 Abstract
Data fusion models are widely used in air quality monitoring to integrate in situ and remote-sensing data, offering spatially complete and temporally detailed estimates. However, traditional Gaussian-based models often underestimate extreme pollution values, leading to biased risk assessments. To address this, we present a Bayesian hierarchical data fusion framework rooted in extreme value theory, using the Dirac-delta generalised Pareto distribution to jointly account for threshold and non-threshold exceedances while preserving the temporal structure of extreme events. Our model is used to describe and predict censored threshold exceedances of PM2.5 pollution in the Greater London region by using remote sensing observations from the EAC4 dataset, a reanalysis product from the Copernicus Atmospheric Monitoring Service (CAMS), and in situ observation stations from the automatic urban and rural network (AURN) ran by the UK government. Some of our approach's key innovations include combining data with varying spatio-temporal resolutions and fully accounting for parameter uncertainties. Results show that our model outperforms Gaussian-based alternatives and standalone remote-sensing data in predicting threshold exceedances at the majority of observation sites and can even result in improved spatial patterns of PM2.5 pollution than those discernible from the remote-sensing data. Moreover, our approach captures greater variability and spatial patterns, such as higher PM2.5 concentrations near coastal areas, which are not evident in remote-sensing data alone.