🤖 AI Summary
Remote sensing foundation models (RSFMs) suffer from poor generalization across diverse spectral modalities (e.g., multispectral and hyperspectral) and heavy reliance on large-scale spectral pretraining. To address this, we propose SpectralX—a parameter-efficient framework comprising: (1) a Hyper Tokenizer that explicitly encodes spatiotemporal-spectral joint attributes; (2) an Attribute-oriented Mixture of Adapters (AoMoA) for dynamic aggregation of multi-attribute knowledge; and (3) an Attribute-refined Adapter (Are-Adapter) that iteratively enhances semantic focus. SpectralX employs masked spectral reconstruction pretraining and inter-layer adapter design, enabling efficient cross-domain adaptation with only lightweight fine-tuning. Evaluated on multispectral and hyperspectral semantic segmentation, SpectralX achieves substantial improvements in domain generalization—supporting robust inference across geographical regions and seasons. It establishes a novel paradigm for spectral modality transfer, reducing dependency on modality-specific pretraining while enhancing adaptability and scalability.
📝 Abstract
Recent advances in Remote Sensing Foundation Models (RSFMs) have led to significant breakthroughs in the field. While many RSFMs have been pretrained with massive optical imagery, more multispectral/hyperspectral data remain lack of the corresponding foundation models. To leverage the advantages of spectral imagery in earth observation, we explore whether existing RSFMs can be effectively adapted to process diverse spectral modalities without requiring extensive spectral pretraining. In response to this challenge, we proposed SpectralX, an innovative parameter-efficient fine-tuning framework that adapt existing RSFMs as backbone while introducing a two-stage training approach to handle various spectral inputs, thereby significantly improving domain generalization performance. In the first stage, we employ a masked-reconstruction task and design a specialized Hyper Tokenizer (HyperT) to extract attribute tokens from both spatial and spectral dimensions. Simultaneously, we develop an Attribute-oriented Mixture of Adapter (AoMoA) that dynamically aggregates multi-attribute expert knowledge while performing layer-wise fine-tuning. With semantic segmentation as downstream task in the second stage, we insert an Attribute-refined Adapter (Are-adapter) into the first stage framework. By iteratively querying low-level semantic features with high-level representations, the model learns to focus on task-beneficial attributes, enabling customized adjustment of RSFMs. Following this two-phase adaptation process, SpectralX is capable of interpreting spectral imagery from new regions or seasons. The codes will be available from the website: https://github.com/YuxiangZhang-BIT.