DINO Soars: DINOv3 for Open-Vocabulary Semantic Segmentation of Remote Sensing Imagery

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

161K/year
🤖 AI Summary
This work addresses the challenges of semantic segmentation in remote sensing imagery, where densely annotated data are scarce and existing zero-shot methods suffer from limited performance. It introduces CAFe-DINO, the first framework to adapt DINOv3 for open-vocabulary remote sensing segmentation. By leveraging a cost aggregation module and a training-free text-to-image similarity upsampling mechanism, CAFe-DINO achieves high-performance segmentation without any fine-tuning on remote sensing data. Notably, the model is only fine-tuned on the COCO-Stuff remote sensing subset yet surpasses current open-vocabulary approaches that require domain-specific fine-tuning, attaining state-of-the-art results across multiple mainstream remote sensing segmentation benchmarks.
📝 Abstract
The remote sensing (RS) domain suffers from a lack of densely labeled datasets, which are costly to obtain. Thus, models that can segment RS imagery well without supervised fine-tuning are valuable, but existing solutions fall behind supervised methods. Recently, DINOv3 surpassed SOTA RS foundation models on the GEO-bench segmentation benchmark without pre-training on RS data. Additionally, DINO.txt has enabled open vocabulary semantic segmentation (OVSS) with the DINOv3 backbone. We leverage these developments to form an OVSS model for RS imagery, free of RS-domain fine-tuning. Our model, CAFe-DINO (Cost Aggregation + Feature Upsampling with DINO) exploits the strong OVSS performance of DINOv3 for RS imagery via cost aggregation and training-free upsampling of text-image similarity scores. The robust latent of the DINOv3 backbone eliminates the need for fine-tuning on RS imagery; we instead fine-tune our model on a RS-targeted subset of COCO-Stuff. CAFe-DINO achieves state-of-the-art performance on key RS segmentation datasets, outperforming OVSS methods fine-tuned on RS data. Our code and data are publicly available at https://github.com/rfaulk/DINO_Soars.
Problem

Research questions and friction points this paper is trying to address.

remote sensing
open-vocabulary semantic segmentation
dense labeling
foundation models
domain adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary semantic segmentation
DINOv3
remote sensing
cost aggregation
training-free upsampling