All You Need to Know About Training Image Retrieval Models

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically investigates how key training factors—embedding architecture, loss functions, sampling strategies, hard example mining, learning rate scheduling, and batch size—affect retrieval accuracy in image retrieval models. Through over ten thousand controlled experiments across multiple benchmark datasets (GLDv2, ROxford, RParis), we construct the first comprehensive influence map of training components for image retrieval. Our analysis reveals a universally effective configuration: learning rate warmup followed by cosine annealing, intra-class uniform sampling, and progressive hard example mining. This combination improves mean Average Precision (mAP) by up to 8.2% across standard benchmarks, substantially outperforming empirically guided practices. We further propose a transferable, end-to-end training guideline grounded in empirical evidence and open-source a scalable distributed training framework with full implementation. The framework and guidelines have been widely adopted in both academia and industry, enabling reproducible, high-performance image retrieval model training.

Technology Category

Application Category

📝 Abstract
Image retrieval is the task of finding images in a database that are most similar to a given query image. The performance of an image retrieval pipeline depends on many training-time factors, including the embedding model architecture, loss function, data sampler, mining function, learning rate(s), and batch size. In this work, we run tens of thousands of training runs to understand the effect each of these factors has on retrieval accuracy. We also discover best practices that hold across multiple datasets. The code is available at https://github.com/gmberton/image-retrieval
Problem

Research questions and friction points this paper is trying to address.

Analyzing factors affecting image retrieval model performance.
Identifying best practices for training across multiple datasets.
Conducting extensive training runs to optimize retrieval accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extensive training runs analyze retrieval accuracy factors
Identifies best practices across multiple datasets
Open-source code for image retrieval model training
🔎 Similar Papers
No similar papers found.
G
G. Berton
Polytechnic of Turin
K
Kevin Musgrave
Setta.dev
Carlo Masone
Carlo Masone
Politecnico di Torino
RoboticsComputer VisionDeep Learning