🤖 AI Summary
Spectral mixing in hyperspectral imagery obscures pure endmember signatures, limiting the accuracy of pixel-wise material identification and abundance estimation. To address this, we propose a variational autoencoder framework integrating physical constraints with deep generative modeling. Our method introduces a bundled endmember mechanism that jointly models endmember spectral variability via mean–structured covariance representations; enforces probabilistic and sum-to-one abundance constraints through a Dirichlet latent prior; incorporates a Transformer encoder to capture global spatial-spectral context; and employs patch-level covariance modeling for dynamic endmember adaptation. Evaluated on the Samson, Jasper Ridge, and HYDICE Urban datasets, our approach consistently outperforms state-of-the-art methods—reducing abundance estimation RMSE by 12.7% on average and spectral angle distance by 9.3%. The framework achieves strong physical interpretability while enabling end-to-end joint optimization of spectral unmixing.
📝 Abstract
Hyperspectral images capture rich spectral information that enables per-pixel material identification; however, spectral mixing often obscures pure material signatures. To address this challenge, we propose the Latent Dirichlet Transformer Variational Autoencoder (LDVAE-T) for hyperspectral unmixing. Our model combines the global context modeling capabilities of transformer architectures with physically meaningful constraints imposed by a Dirichlet prior in the latent space. This prior naturally enforces the sum-to-one and non-negativity conditions essential for abundance estimation, thereby improving the quality of predicted mixing ratios. A key contribution of LDVAE-T is its treatment of materials as bundled endmembers, rather than relying on fixed ground truth spectra. In the proposed method our decoder predicts, for each endmember and each patch, a mean spectrum together with a structured (segmentwise) covariance that captures correlated spectral variability. Reconstructions are formed by mixing these learned bundles with Dirichlet-distributed abundances garnered from a transformer encoder, allowing the model to represent intrinsic material variability while preserving physical interpretability. We evaluate our approach on three benchmark datasets, Samson, Jasper Ridge, and HYDICE Urban and show that LDVAE-T consistently outperforms state-of-the-art models in abundance estimation and endmember extraction, as measured by root mean squared error and spectral angle distance, respectively.