INSID3: Training-Free In-Context Segmentation with DINOv3

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the challenge of performing context-aware segmentation of arbitrary concepts—such as objects, parts, or personalized instances—from a single annotated example without any fine-tuning. The proposed method leverages a frozen, self-supervised DINOv3 backbone to extract dense features and integrates semantic correspondence with spatial structural cues to achieve fully training-free, multi-granularity segmentation. It is the first to demonstrate that a single self-supervised model can simultaneously support both semantic matching and segmentation without fine-tuning, auxiliary models, or mask- or class-level supervision. The approach achieves state-of-the-art performance across one-shot semantic, part, and personalized segmentation tasks, improving mIoU by 7.5% while using only one-third the parameters of prior methods.

Technology Category

Application Category

📝 Abstract

In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-supervised backbone support both semantic matching and segmentation, without any supervision or auxiliary models? We show that scaled-up dense self-supervised features from DINOv3 exhibit strong spatial structure and semantic correspondence. We introduce INSID3, a training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. INSID3 achieves state-of-the-art results across one-shot semantic, part, and personalized segmentation, outperforming previous work by +7.5 % mIoU, while using 3x fewer parameters and without any mask or category-level supervision. Code is available at https://github.com/visinf/INSID3 .

Problem

Research questions and friction points this paper is trying to address.

in-context segmentation

one-shot segmentation

vision foundation models

semantic segmentation

training-free

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

in-context segmentation

DINOv3