🤖 AI Summary
Large-scale spatiotemporal atmospheric remote sensing data—specifically, four years of Sentinel-5P tropospheric NO₂ measurements over the contiguous United States—exhibit extensive missingness due to persistent cloud cover, posing significant challenges for environmental monitoring and analysis.
Method: This study first identifies a pronounced low-rank structure in such data when represented as tensors. Leveraging this property, we propose a Low-Rank Tensor Model (LRTM) that integrates CANDECOMP/PARAFAC (CP) decomposition with Alternating Least Squares (ALS) to reconstruct long-term, large-scale missing entries. We benchmark LRTM against geostatistical interpolation methods.
Contribution/Results: Experiments demonstrate that LRTM achieves superior reconstruction accuracy and robustness, especially in predicting extreme values and capturing spatiotemporal variation patterns. It effectively supports pollution hotspot identification and anomaly detection. This work establishes a novel paradigm for atmospheric data imputation and extends the applicability of low-rank tensor modeling in Earth system science.
📝 Abstract
In this study, we investigate for the first time the low-rank properties of a tensorized large-scale spatio-temporal dynamic atmospheric variable. We focus on the Sentinel-5P tropospheric NO2 product (S5P-TN) over a four-year period in an area that encompasses the contiguous United States (CONUS). Here, it is demonstrated that a low-rank approximation of such a dynamic variable is feasible. We apply the low-rank properties of the S5P-TN data to inpaint gaps in the Sentinel-5P product by adopting a low-rank tensor model (LRTM) based on the CANDECOMP / PARAFAC (CP) decomposition and alternating least squares (ALS). Furthermore, we evaluate the LRTM's results by comparing them with spatial interpolation using geostatistics, and conduct a comprehensive spatial statistical and temporal analysis of the S5P-TN product. The results of this study demonstrated that the tensor completion successfully reconstructs the missing values in the S5P-TN product, particularly in the presence of extended cloud obscuration, predicting outliers and identifying hotspots, when the data is tensorized over extended spatial and temporal scales.