INSID3: Training-Free In-Context Segmentation with DINOv3

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of performing context-aware segmentation of arbitrary concepts—such as objects, parts, or personalized instances—from a single annotated example without any fine-tuning. The proposed method leverages a frozen, self-supervised DINOv3 backbone to extract dense features and integrates semantic correspondence with spatial structural cues to achieve fully training-free, multi-granularity segmentation. It is the first to demonstrate that a single self-supervised model can simultaneously support both semantic matching and segmentation without fine-tuning, auxiliary models, or mask- or class-level supervision. The approach achieves state-of-the-art performance across one-shot semantic, part, and personalized segmentation tasks, improving mIoU by 7.5% while using only one-third the parameters of prior methods.
📝 Abstract
In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-supervised backbone support both semantic matching and segmentation, without any supervision or auxiliary models? We show that scaled-up dense self-supervised features from DINOv3 exhibit strong spatial structure and semantic correspondence. We introduce INSID3, a training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. INSID3 achieves state-of-the-art results across one-shot semantic, part, and personalized segmentation, outperforming previous work by +7.5 % mIoU, while using 3x fewer parameters and without any mask or category-level supervision. Code is available at https://github.com/visinf/INSID3 .
Problem

Research questions and friction points this paper is trying to address.

in-context segmentation
one-shot segmentation
vision foundation models
semantic segmentation
training-free
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
in-context segmentation
DINOv3
self-supervised features
few-shot segmentation
🔎 Similar Papers
No similar papers found.