🤖 AI Summary
In hyperspectral remote sensing, dense prediction tasks—such as semantic segmentation and change detection—are severely hindered by the scarcity and high acquisition cost of pixel-level ground-truth annotations. To address this, we propose the first synthesis method for pixel-level semantic labeling of hyperspectral images (HSIs). Our approach introduces a novel dual-stream variational autoencoder (VAE) coupled with a conditional diffusion model, jointly modeling the latent-space joint distribution of HSIs and their corresponding semantic masks to enable co-generation of high-dimensional spectral imagery and precise pixel-wise labels. This is the first framework capable of end-to-end, semantic-mask-conditioned HSI generation. Evaluated on multiple benchmarks, our synthesized data significantly improves downstream model performance—achieving accuracy comparable to models trained on real annotated data—thereby demonstrating both effectiveness and strong generalization capability.
📝 Abstract
In hyperspectral remote sensing field, some downstream dense prediction tasks, such as semantic segmentation (SS) and change detection (CD), rely on supervised learning to improve model performance and require a large amount of manually annotated data for training. However, due to the needs of specific equipment and special application scenarios, the acquisition and annotation of hyperspectral images (HSIs) are often costly and time-consuming. To this end, our work explores the potential of generative diffusion model in synthesizing HSIs with pixel-level annotations. The main idea is to utilize a two-stream VAE to learn the latent representations of images and corresponding masks respectively, learn their joint distribution during the diffusion model training, and finally obtain the image and mask through their respective decoders. To the best of our knowledge, it is the first work to generate high-dimensional HSIs with annotations. Our proposed approach can be applied in various kinds of dataset generation. We select two of the most widely used dense prediction tasks: semantic segmentation and change detection, and generate datasets suitable for these tasks. Experiments demonstrate that our synthetic datasets have a positive impact on the improvement of these downstream tasks.