🤖 AI Summary
To address spectral shift across sensors and geographical regions in multi-spectral land cover classification (MLCC), as well as the limited generalization capability of existing methods relying on small-scale models, this paper proposes a parameter-efficient fine-tuning paradigm for vision foundation models. Our core innovation is MoLTE+FAF—a frequency-aware low-rank Token Mixture-of-Experts mechanism—that jointly enhances semantic representation and suppresses noise in the frequency domain via spectral filtering and low-rank decomposition. Integrated with dynamic expert routing and multispectral feature disentanglement, it overcomes generalization bottlenecks in large-model domain adaptation. Extensive experiments demonstrate significant improvements over state-of-the-art (SOTA) methods on multi-source, cross-domain land-cover classification. Moreover, our approach achieves SOTA performance on RGB remote sensing image semantic segmentation, validating its broad applicability and robustness.
📝 Abstract
We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC). Spectral shift, which emerges from disparities in sensors and geospatial conditions, poses a significant challenge in this domain. Existing methods predominantly rely on domain adaptation and generalization strategies, often utilizing small-scale models that exhibit limited performance. In contrast, Land-MoE addresses these issues by hierarchically inserting a Frequency-aware Mixture of Low-rank Token Experts, to fine-tune Vision Foundation Models (VFMs) in a parameter-efficient manner. Specifically, Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF). MoLTE leverages rank-differentiated tokens to generate diverse feature adjustments for individual instances within multispectral images. By dynamically combining learnable low-rank token experts of varying ranks, it enhances the robustness against spectral shifts. Meanwhile, FAF conducts frequency-domain modulation on the refined features. This process enables the model to effectively capture frequency band information that is strongly correlated with semantic essence, while simultaneously suppressing frequency noise irrelevant to the task. Comprehensive experiments on MLCC tasks involving cross-sensor and cross-geospatial setups demonstrate that Land-MoE outperforms existing methods by a large margin. Additionally, the proposed approach has also achieved state-of-the-art performance in domain generalization semantic segmentation tasks of RGB remote sensing images.