🤖 AI Summary
Remote sensing few-shot segmentation faces challenges of high annotation costs and severe scarcity of novel-class samples. Existing approaches rely on specialized architectures or complex meta-learning strategies. This paper proposes a lightweight, backbone-agnostic paradigm: modeling diffusion models as conditional remote sensing image inpainting tasks to synthesize semantically consistent and diverse target instances conditioned on only 1–5 novel-class samples; integrating SAM for automatic high-precision mask generation; and employing cosine similarity-based filtering to ensure semantic consistency. The synthesized data enables efficient fine-tuning with minimal overhead. Evaluated on multiple remote sensing few-shot segmentation benchmarks, our method achieves an average +12.7% mIoU improvement over prior art. With merely five samples per novel class, it attains over 90% of the performance of fully supervised baselines—marking the first effective, plug-and-play application of diffusion models to remote sensing few-shot segmentation.
📝 Abstract
Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects within a given scene, conditioned by the limited examples of the novel classes. By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments, effectively increasing the number of samples for the novel classes and mitigating overfitting. The generated samples are then assessed using a cosine similarity metric to ensure semantic consistency with the novel classes. Additionally, we employ Segment Anything Model (SAM) to segment the generated samples and obtain precise annotations. By using high-quality synthetic data, we can directly fine-tune off-the-shelf segmentation models. Experimental results demonstrate that our method significantly enhances segmentation performance in low-data regimes, highlighting its potential for real-world remote sensing applications.