Hyperspectral Variational Autoencoders for Joint Data Compression and Component Extraction

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of massive data volume and storage/transmission bottlenecks associated with geostationary hyperspectral satellites (e.g., NASA’s TEMPO), this paper proposes a variational autoencoder (VAE)-based joint compression and atmospheric retrieval framework. The method unifies hyperspectral data compression with latent-space extraction of Level-2 atmospheric products—including NO₂, O₃, HCHO, and cloud fraction—for the first time. We discover that atmospheric information exhibits semi-linear latent encoding, where nonlinear latent probes significantly outperform linear models, while explicit supervision yields marginal gains in latent-space quality—revealing fundamental information-preservation constraints in neural compression. Our approach achieves an ultra-high compression ratio of 514×, with reconstruction errors reduced by one to two orders of magnitude. Retrieval accuracy reaches R² = 0.93 for cloud fraction and R² = 0.81 for total ozone, demonstrating efficient preservation of critical atmospheric signals.

Technology Category

Application Category

📝 Abstract
Geostationary hyperspectral satellites generate terabytes of data daily, creating critical challenges for storage, transmission, and distribution to the scientific community. We present a variational autoencoder (VAE) approach that achieves x514 compression of NASA's TEMPO satellite hyperspectral observations (1028 channels, 290-490nm) with reconstruction errors 1-2 orders of magnitude below the signal across all wavelengths. This dramatic data volume reduction enables efficient archival and sharing of satellite observations while preserving spectral fidelity. Beyond compression, we investigate to what extent atmospheric information is retained in the compressed latent space by training linear and nonlinear probes to extract Level-2 products (NO2, O3, HCHO, cloud fraction). Cloud fraction and total ozone achieve strong extraction performance (R^2 = 0.93 and 0.81 respectively), though these represent relatively straightforward retrievals given their distinct spectral signatures. In contrast, tropospheric trace gases pose genuine challenges for extraction (NO2 R^2 = 0.20, HCHO R^2 = 0.51) reflecting their weaker signals and complex atmospheric interactions. Critically, we find the VAE encodes atmospheric information in a semi-linear manner - nonlinear probes substantially outperform linear ones - and that explicit latent supervision during training provides minimal improvement, revealing fundamental encoding challenges for certain products. This work demonstrates that neural compression can dramatically reduce hyperspectral data volumes while preserving key atmospheric signals, addressing a critical bottleneck for next-generation Earth observation systems. Code - https://github.com/cfpark00/Hyperspectral-VAE
Problem

Research questions and friction points this paper is trying to address.

Compressing terabytes of hyperspectral satellite data for efficient storage and transmission
Extracting atmospheric components from compressed data while preserving spectral fidelity
Addressing encoding challenges for weak atmospheric signals through neural compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational autoencoder achieves x514 hyperspectral data compression
Latent space preserves atmospheric information for component extraction
Nonlinear probes outperform linear ones for atmospheric signal retrieval
🔎 Similar Papers
No similar papers found.
Core Francisco Park
Core Francisco Park
Harvard University
AI for ScienceScience of Deep Learning
M
Manuel Perez-Carrasco
Center for Astrophysics|Harvard & Smithsonian, Cambridge, MA, USA
C
Caroline Nowlan
Center for Astrophysics|Harvard & Smithsonian, Cambridge, MA, USA
C
Cecilia Garraffo
Center for Astrophysics|Harvard & Smithsonian, Cambridge, MA, USA