🤖 AI Summary
To address key bottlenecks in cloud detection for Visible–Shortwave Infrared (VSWIR) imaging spectrometers—including heavy reliance on spatial/temporal context, poor generalizability, and limited interpretability—this work proposes a purely spectral-sequence modeling approach. We introduce a novel spectroscopy-specific Transformer architecture featuring a physics-informed spectral attention mechanism, operating end-to-end on one-dimensional spectral inputs under supervised learning. The method achieves strong physical interpretability—evidenced by salient activation of water vapor, oxygen, and other physically relevant absorption bands—and enables zero-shot cross-instrument transfer without adaptation. With only 1–10% of the parameters of state-of-the-art complex models, it significantly outperforms existing benchmarks on the EMIT dataset while matching the accuracy of high-parameter ML models. This work establishes an efficient, robust, and interpretable paradigm for cloud identification in resource-constrained, temporally sparse, or spatially agnostic scenarios.
📝 Abstract
Current and upcoming generations of visible-shortwave infrared (VSWIR) imaging spectrometers promise unprecedented capacity to quantify Earth System processes across the globe. However, reliable cloud screening remains a fundamental challenge for these instruments, where traditional spatial and temporal approaches are limited by cloud variability and limited temporal coverage. The Spectroscopic Transformer (SpecTf) addresses these challenges with a spectroscopy-specific deep learning architecture that performs cloud detection using only spectral information (no spatial or temporal data are required). By treating spectral measurements as sequences rather than image channels, SpecTf learns fundamental physical relationships without relying on spatial context. Our experiments demonstrate that SpecTf significantly outperforms the current baseline approach implemented for the EMIT instrument, and performs comparably with other machine learning methods with orders of magnitude fewer learned parameters. Critically, we demonstrate SpecTf's inherent interpretability through its attention mechanism, revealing physically meaningful spectral features the model has learned. Finally, we present SpecTf's potential for cross-instrument generalization by applying it to a different instrument on a different platform without modifications, opening the door to instrument agnostic data driven algorithms for future imaging spectroscopy tasks.