🤖 AI Summary
Remote sensing image segmentation faces challenges in distinguishing morphologically similar categories and adapting to diverse scenes—particularly in fine-grained tasks such as cloud thickness classification, where existing methods struggle to dynamically model context-aware semantic embeddings. To address this, we propose a dynamic dictionary learning framework: (1) multi-stage alternating cross-attention iteratively refines image features and class-aware semantic embeddings; (2) a novel differentiable dynamic dictionary construction mechanism, jointly optimized with dictionary-level contrastive constraints, explicitly balances intra-class heterogeneity and inter-class homogeneity—overcoming limitations of implicit representation learning. Extensive experiments demonstrate state-of-the-art performance on the LoveDA and UAVid online test sets. Crucially, our method achieves consistent improvements across both coarse- and fine-grained benchmarks, including cloud thickness segmentation.
📝 Abstract
Remote sensing image segmentation faces persistent challenges in distinguishing morphologically similar categories and adapting to diverse scene variations. While existing methods rely on implicit representation learning paradigms, they often fail to dynamically adjust semantic embeddings according to contextual cues, leading to suboptimal performance in fine-grained scenarios such as cloud thickness differentiation. This work introduces a dynamic dictionary learning framework that explicitly models class ID embeddings through iterative refinement. The core contribution lies in a novel dictionary construction mechanism, where class-aware semantic embeddings are progressively updated via multi-stage alternating cross-attention querying between image features and dictionary embeddings. This process enables adaptive representation learning tailored to input-specific characteristics, effectively resolving ambiguities in intra-class heterogeneity and inter-class homogeneity. To further enhance discriminability, a contrastive constraint is applied to the dictionary space, ensuring compact intra-class distributions while maximizing inter-class separability. Extensive experiments across both coarse- and fine-grained datasets demonstrate consistent improvements over state-of-the-art methods, particularly in two online test benchmarks (LoveDA and UAVid). Code is available at https://anonymous.4open.science/r/D2LS-8267/.