🤖 AI Summary
This work addresses two key limitations in image segmentation: heavy reliance on dense pixel-level annotations and insufficient user control over segmentation outcomes. To this end, we propose a novel interactive segmentation paradigm grounded in hand-drawn sketches. Methodologically, we introduce the first sketch-based closed-loop segmentation framework that synergistically integrates sketch-based image retrieval (SBIR) with large-scale vision models (CLIP and DINOv2), and design a sketch-driven data augmentation and fine-tuning mechanism supporting flexible inputs—including local, global, and multi-region sketches. Our core contributions are threefold: (1) enabling annotation-free, multi-granularity segmentation with explicit user intent control; (2) transcending conventional supervised learning paradigms; and (3) achieving substantial improvements over state-of-the-art methods across multiple benchmarks, particularly in generalization capability and alignment with subjective user intentions.
📝 Abstract
In this paper, we expand the domain of sketch research into the field of image segmentation, aiming to establish freehand sketches as a query modality for subjective image segmentation. Our innovative approach introduces a"sketch-in-the-loop"image segmentation framework, enabling the segmentation of visual concepts partially, completely, or in groupings - a truly"freestyle"approach - without the need for a purpose-made dataset (i.e., mask-free). This framework capitalises on the synergy between sketch-based image retrieval (SBIR) models and large-scale pre-trained models (CLIP or DINOv2). The former provides an effective training signal, while fine-tuned versions of the latter execute the subjective segmentation. Additionally, our purpose-made augmentation strategy enhances the versatility of our sketch-guided mask generation, allowing segmentation at multiple granularity levels. Extensive evaluations across diverse benchmark datasets underscore the superior performance of our method in comparison to existing approaches across various evaluation scenarios.