🤖 AI Summary
Hyperspectral image (HSI) classification suffers from low spatial resolution and severe label scarcity. To address these challenges, we propose a label-efficient framework that freezes a diffusion model pre-trained on natural images to extract robust low-level spatial features; introduces a spectral-aware FiLM modulation module to dynamically condition these spatial features with spectral information, enabling cross-modal and cross-domain feature fusion; and employs a lightweight classification head optimized end-to-end. To our knowledge, this is the first work to effectively transfer low-level spatial representations from pre-trained diffusion models to HSI classification. Our method achieves state-of-the-art performance on two recent benchmarks using only extremely sparse annotations. Ablation studies confirm the critical roles of both the diffusion-based feature transfer mechanism and the FiLM-based fusion strategy, significantly improving fine-grained land-cover classification under few-shot settings.
📝 Abstract
Hyperspectral imaging (HSI) enables detailed land cover classification, yet low spatial resolution and sparse annotations pose significant challenges. We present a label-efficient framework that leverages spatial features from a frozen diffusion model pretrained on natural images. Our approach extracts low-level representations from high-resolution decoder layers at early denoising timesteps, which transfer effectively to the low-texture structure of HSI. To integrate spectral and spatial information, we introduce a lightweight FiLM-based fusion module that adaptively modulates frozen spatial features using spectral cues, enabling robust multimodal learning under sparse supervision. Experiments on two recent hyperspectral datasets demonstrate that our method outperforms state-of-the-art approaches using only the provided sparse training labels. Ablation studies further highlight the benefits of diffusion-derived features and spectral-aware fusion. Overall, our results indicate that pretrained diffusion models can support domain-agnostic, label-efficient representation learning for remote sensing and broader scientific imaging tasks.