Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work addresses the performance degradation of multimodal remote sensing semantic segmentation under partial modality missingness, a challenge exacerbated by conventional approaches that fail to account for varying information content across different missing-modality scenarios. To tackle this, the authors propose a pretrained latent-space-guided non-uniform scene sampling strategy. Specifically, they quantify scene informativeness by measuring representation distortion induced by modality absence, model inter-scene relationships using a radial basis function kernel, and derive a sampling probability distribution via regularized kernel smoothing—prioritizing scenes that are both information-rich and minimally perturbed by missing modalities. The method is compatible with various backbones, including CBC-SLP, CBC, and CMX, and consistently outperforms standard fine-tuning and LoRA adaptation across the DSTL, Potsdam, and Hunan datasets, demonstrating the efficacy of latent-space-guided sampling.
📝 Abstract
Multimodal semantic segmentation benefits remote sensing analysis by combining complementary information from different sensor modalities. In real-world remote sensing applications, one or more modalities may be unavailable due to sensor failures, adverse atmospheric conditions, or data acquisition problems. Even with pretrained multimodal representations and existing fine-tuning or adaptation strategies, performance may remain limited because all modality availability scenarios are typically treated as equally informative during training. In this paper, we propose a novel training strategy that learns a scenario sampling distribution directly from the pretrained latent space. Instead of relying on uniform random modality dropout, the proposed method guides fine-tuning toward more informative modality availability scenarios. More specifically, we quantify the effect of each scenario independently based on the distortion it induces in the shared latent representation. We then capture scenario relations using a radial basis function kernel and derive refined scenario scores through a regularized kernel smoothing. These scores are then converted into a probability distribution during scenario sampling for fine-tuning. We evaluate this strategy on three remote sensing image sets, namely DSTL, Potsdam, and Hunan, using CBC-SLP, CBC, and CMX backbones. The experimental results with different image sets and backbones show that our method outperforms standard fine-tuning and LoRA-based adaptation. These findings suggest that the pretrained latent representation can serve as an effective basis for sampling during missing modality fine-tuning. Code is available at https://github.com/iremulku/Latent-Space-Guided-Scenario-Sampling
Problem

Research questions and friction points this paper is trying to address.

multimodal segmentation
missing modalities
remote sensing
latent space
scenario sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent space
scenario sampling
missing modalities
multimodal segmentation
kernel smoothing
🔎 Similar Papers
No similar papers found.