🤖 AI Summary
Hyperspectral image self-supervised reconstruction suffers from poor representation interpretability, weak generalization, and low data efficiency. Method: This paper embeds physics-informed priors—specifically the Linear Spectral Mixture Model (LSMM) and Spectral Angle Mapper (SAM)—into a Vision Transformer-based Masked Autoencoder (ViT-MAE) framework. It is the first to jointly formulate LSMM constraints and SAM-based geometric metrics as reconstruction objectives within MAE, enabling end-to-end co-optimization of data-driven learning and physical modeling. Huber loss is employed to jointly optimize reconstruction fidelity and spectral geometric consistency. Contribution/Results: Under limited labeling, the method improves downstream classification and unmixing performance by over 8%, enhances training stability, and yields latent representations that strictly adhere to linear mixing physics. This significantly boosts few-shot robustness and physical interpretability.
📝 Abstract
Integrating domain knowledge into deep learning has emerged as a promising direction for improving model interpretability, generalization, and data efficiency. In this work, we present a novel knowledge-guided ViT-based Masked Autoencoder that embeds scientific domain knowledge within the self-supervised reconstruction process. Instead of relying solely on data-driven optimization, our proposed approach incorporates the Linear Spectral Mixing Model (LSMM) as a physical constraint and physically-based Spectral Angle Mapper (SAM), ensuring that learned representations adhere to known structural relationships between observed signals and their latent components. The framework jointly optimizes LSMM and SAM loss with a conventional Huber loss objective, promoting both numerical accuracy and geometric consistency in the feature space. This knowledge-guided design enhances reconstruction fidelity, stabilizes training under limited supervision, and yields interpretable latent representations grounded in physical principles. The experimental findings indicate that the proposed model substantially enhances reconstruction quality and improves downstream task performance, highlighting the promise of embedding physics-informed inductive biases within transformer-based self-supervised learning.