ILIAS: Instance-Level Image retrieval At Scale

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This paper addresses the evaluation challenge of instance-level image retrieval in large-scale, cross-domain, and fine-grained scenarios. Methodologically, it integrates multi-domain supervised linear adaptation, vision-language model (VLM) fine-tuning, local feature re-ranking, and cross-modal (text↔image) joint evaluation. Its primary contribution is ILIAS—the first benchmark for instance-level retrieval—comprising 1,000 real-world object instances and 100 million distractor images, uniquely balancing scale, domain diversity, and precise instance-level annotations. To ensure reliable, zero-cost evaluation, it introduces a post-YFCC100M temporal constraint strategy that prevents annotation omissions without requiring additional human labeling. Experimental results reveal: (1) state-of-the-art specialized models exhibit severe generalization deficits across domains; (2) multi-domain supervision substantially enhances VLM performance; (3) local descriptors remain indispensable under strong background clutter; and (4) text-to-image retrieval accuracy now approaches that of image-to-image retrieval.

Technology Category

Application Category

📝 Abstract

This work introduces ILIAS, a new test dataset for Instance-Level Image retrieval At Scale. It is designed to evaluate the ability of current and future foundation models and retrieval techniques to recognize particular objects. The key benefits over existing datasets include large scale, domain diversity, accurate ground truth, and a performance that is far from saturated. ILIAS includes query and positive images for 1,000 object instances, manually collected to capture challenging conditions and diverse domains. Large-scale retrieval is conducted against 100 million distractor images from YFCC100M. To avoid false negatives without extra annotation effort, we include only query objects confirmed to have emerged after 2014, i.e. the compilation date of YFCC100M. An extensive benchmarking is performed with the following observations: i) models fine-tuned on specific domains, such as landmarks or products, excel in that domain but fail on ILIAS ii) learning a linear adaptation layer using multi-domain class supervision results in performance improvements, especially for vision-language models iii) local descriptors in retrieval re-ranking are still a key ingredient, especially in the presence of severe background clutter iv) the text-to-image performance of the vision-language foundation models is surprisingly close to the corresponding image-to-image case. website: https://vrg.fel.cvut.cz/ilias/

Problem

Research questions and friction points this paper is trying to address.

Evaluate image retrieval models

Handle diverse object recognition

Benchmark against large-scale distractors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale image retrieval dataset

Multi-domain class supervision

Local descriptors re-ranking

🔎 Similar Papers

No similar papers found.