Back To The Drawing Board: Rethinking Scene-Level Sketch-Based Image Retrieval

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In scene-level sketch-to-image retrieval, hand-drawn sketches suffer from inherent ambiguity, noise, and difficulty in semantic-layout alignment. To address these challenges, this paper proposes a robust cross-modal alignment framework. Methodologically, it introduces a novel training objective explicitly designed to accommodate sketch diversity, uncovering the critical impact of training strategies on retrieval performance; rather than increasing model complexity, it achieves efficient alignment via synergistic optimization—leveraging pretrained initialization, lightweight encoder refinement, and a customized contrastive loss. The proposed approach achieves state-of-the-art performance on FS-COCO and SketchyCOCO, significantly improving retrieval robustness and generalization. Moreover, it advances evaluation paradigms toward realistic sketch scenarios, bridging the gap between controlled benchmarks and practical deployment.

Technology Category

Application Category

📝 Abstract
The goal of Scene-level Sketch-Based Image Retrieval is to retrieve natural images matching the overall semantics and spatial layout of a free-hand sketch. Unlike prior work focused on architectural augmentations of retrieval models, we emphasize the inherent ambiguity and noise present in real-world sketches. This insight motivates a training objective that is explicitly designed to be robust to sketch variability. We show that with an appropriate combination of pre-training, encoder architecture, and loss formulation, it is possible to achieve state-of-the-art performance without the introduction of additional complexity. Extensive experiments on a challenging FS-COCO and widely-used SketchyCOCO datasets confirm the effectiveness of our approach and underline the critical role of training design in cross-modal retrieval tasks, as well as the need to improve the evaluation scenarios of scene-level SBIR.
Problem

Research questions and friction points this paper is trying to address.

Addressing ambiguity and noise in real-world scene sketches
Developing robust training for sketch variability in retrieval
Improving cross-modal retrieval without added complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust training objective for sketch variability
Appropriate pre-training and encoder architecture
Effective loss formulation without added complexity
🔎 Similar Papers
No similar papers found.