Class Agnostic Instance-level Descriptor for Visual Instance Search

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Visual instance search faces the challenge of lacking effective instance-level feature representations for objects from unknown categories. This paper proposes an unsupervised open-world instance search method that localizes and characterizes multi-scale latent instances in images without requiring category labels. Our core innovation lies in a hierarchical, compact subset detection mechanism built upon self-supervised ViT features: it jointly discovers instance-level feature subsets via spectral clustering and encodes spatial regions without category constraints, thereby unifying the modeling of nested structures, occlusions, and cross-category semantic scale variations. Evaluated on three standard benchmarks, our approach significantly outperforms state-of-the-art methods, demonstrating high robustness and retrieval accuracy for both known and unknown categories. To the best of our knowledge, this is the first work to achieve high-quality instance-level feature learning and search under fully unsupervised conditions.

Technology Category

Application Category

📝 Abstract

Despite the great success of the deep features in content-based image retrieval, the visual instance search remains challenging due to the lack of effective instance level feature representation. Supervised or weakly supervised object detection methods are not among the options due to their poor performance on the unknown object categories. In this paper, based on the feature set output from self-supervised ViT, the instance level region discovery is modeled as detecting the compact feature subsets in a hierarchical fashion. The hierarchical decomposition results in a hierarchy of feature subsets. The non-leaf nodes and leaf nodes on the hierarchy correspond to the various instance regions in an image of different semantic scales. The hierarchical decomposition well addresses the problem of object embedding and occlusions, which are widely observed in the real scenarios. The features derived from the nodes on the hierarchy make up a comprehensive representation for the latent instances in the image. Our instance-level descriptor remains effective on both the known and unknown object categories. Empirical studies on three instance search benchmarks show that it outperforms state-of-the-art methods considerably.

Problem

Research questions and friction points this paper is trying to address.

Lack of effective instance-level feature representation for visual search

Poor performance on unknown object categories with supervised methods

Challenges in object embedding and occlusions in real scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised ViT for feature extraction

Hierarchical decomposition for instance regions

Comprehensive representation for unknown categories

🔎 Similar Papers

Open-World Object Detection with Instance Representation Learning