Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts

📅 2026-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation in remote sensing semantic segmentation caused by spectral shift across domains by proposing SpectralMoE, a novel framework that introduces, for the first time in remote sensing, a locally refined and spatially adaptive Mixture-of-Experts (MoE) mechanism. SpectralMoE employs a dual-gated MoE to enable parameter-efficient fine-tuning of foundation model features, while leveraging depth information estimated from RGB bands to guide spatially adaptive optimization. This design facilitates independent routing of visual and depth modalities and their integration via cross-attention, overcoming the limitations of conventional global, homogeneous fine-tuning strategies. Extensive experiments demonstrate that SpectralMoE achieves state-of-the-art performance across multiple benchmarks for domain-generalized semantic segmentation, including hyperspectral, multispectral, and RGB remote sensing datasets.

Technology Category

Application Category

📝 Abstract
Domain Generalization Semantic Segmentation (DGSS) in spectral remote sensing is severely challenged by spectral shifts across diverse acquisition conditions, which cause significant performance degradation for models deployed in unseen domains. While Parameter-Efficient Fine-Tuning (PEFT) on foundation models is a promising direction, existing methods employ global, homogeneous adjustments. This "one-size-fits-all" tuning struggles with the spatial heterogeneity of land cover, causing semantic confusion. We argue that the key to robust DGSS lies not in a single global adaptation, but in performing fine-grained, spatially-adaptive refinement of a foundation model's features. To achieve this, we propose SpectralMoE, a novel PEFT framework for DGSS. It operationalizes this principle by utilizing a Mixture-of-Experts (MoE) architecture to perform local precise refinement on the foundation model's features, incorporating depth features estimated from selected RGB bands of the spectral remote sensing imagery to guide the fine-tuning process. Specifically, SpectralMoE employs a dual-gated MoE architecture that independently routes visual and depth features to top-k selected experts for specialized refinement, enabling modality-specific adjustments. A subsequent cross-attention mechanism then judiciously fuses the refined structural cues into the visual stream, mitigating semantic ambiguities caused by spectral variations. Extensive experiments show that SpectralMoE sets a new state-of-the-art on multiple DGSS benchmarks across hyperspectral, multispectral, and RGB remote sensing imagery.
Problem

Research questions and friction points this paper is trying to address.

Domain Generalization
Semantic Segmentation
Spectral Shifts
Remote Sensing
Foundation Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
Domain Generalization
Parameter-Efficient Fine-Tuning
Spectral Shift
Semantic Segmentation
X
Xi Chen
National University of Defense Technology
Maojun Zhang
Maojun Zhang
Zhejiang University
semantic communicationmachine learningwireless communicationAIGC
Y
Yu Liu
National University of Defense Technology
S
Shen Yan
National University of Defense Technology