🤖 AI Summary
To address the challenges of massive data volume and storage/transmission bottlenecks associated with geostationary hyperspectral satellites (e.g., NASA’s TEMPO), this paper proposes a variational autoencoder (VAE)-based joint compression and atmospheric retrieval framework. The method unifies hyperspectral data compression with latent-space extraction of Level-2 atmospheric products—including NO₂, O₃, HCHO, and cloud fraction—for the first time. We discover that atmospheric information exhibits semi-linear latent encoding, where nonlinear latent probes significantly outperform linear models, while explicit supervision yields marginal gains in latent-space quality—revealing fundamental information-preservation constraints in neural compression. Our approach achieves an ultra-high compression ratio of 514×, with reconstruction errors reduced by one to two orders of magnitude. Retrieval accuracy reaches R² = 0.93 for cloud fraction and R² = 0.81 for total ozone, demonstrating efficient preservation of critical atmospheric signals.
📝 Abstract
Geostationary hyperspectral satellites generate terabytes of data daily, creating critical challenges for storage, transmission, and distribution to the scientific community. We present a variational autoencoder (VAE) approach that achieves x514 compression of NASA's TEMPO satellite hyperspectral observations (1028 channels, 290-490nm) with reconstruction errors 1-2 orders of magnitude below the signal across all wavelengths. This dramatic data volume reduction enables efficient archival and sharing of satellite observations while preserving spectral fidelity. Beyond compression, we investigate to what extent atmospheric information is retained in the compressed latent space by training linear and nonlinear probes to extract Level-2 products (NO2, O3, HCHO, cloud fraction). Cloud fraction and total ozone achieve strong extraction performance (R^2 = 0.93 and 0.81 respectively), though these represent relatively straightforward retrievals given their distinct spectral signatures. In contrast, tropospheric trace gases pose genuine challenges for extraction (NO2 R^2 = 0.20, HCHO R^2 = 0.51) reflecting their weaker signals and complex atmospheric interactions. Critically, we find the VAE encodes atmospheric information in a semi-linear manner - nonlinear probes substantially outperform linear ones - and that explicit latent supervision during training provides minimal improvement, revealing fundamental encoding challenges for certain products. This work demonstrates that neural compression can dramatically reduce hyperspectral data volumes while preserving key atmospheric signals, addressing a critical bottleneck for next-generation Earth observation systems. Code - https://github.com/cfpark00/Hyperspectral-VAE