QuARI: Query Adaptive Retrieval Improvement

๐Ÿ“… 2025-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the insufficient discriminability of large-scale vision-language models (VLMs) in ultra-large-scale instance retrieval, this paper proposes a query-adaptive linear feature space transformation method: for each text or image query, it dynamically generates a lightweight, learnable, query-specific projection matrix to enable query-level personalized cross-modal feature mapping. Built upon VLM embeddings, the method jointly optimizes the query encoder and domain adaptation objective, achieving substantial retrieval accuracy gains while incurring negligible inference overheadโ€”only ~0.1% additional parameters. It consistently outperforms state-of-the-art methods on major large-scale instance retrieval benchmarks (e.g., Instre, Oxford-Paris-105K, GLDv2), reduces re-ranking latency by one to two orders of magnitude, and enables real-time retrieval over tens of millions of images. The core contribution is the first introduction of query-driven linear transformations into VLM-based cross-modal retrieval, thereby overcoming the representational bottleneck imposed by fixed projection matrices.

Technology Category

Application Category

๐Ÿ“ Abstract
Massive-scale pretraining has made vision-language models increasingly popular for image-to-image and text-to-image retrieval across a broad collection of domains. However, these models do not perform well when used for challenging retrieval tasks, such as instance retrieval in very large-scale image collections. Recent work has shown that linear transformations of VLM features trained for instance retrieval can improve performance by emphasizing subspaces that relate to the domain of interest. In this paper, we explore a more extreme version of this specialization by learning to map a given query to a query-specific feature space transformation. Because this transformation is linear, it can be applied with minimal computational cost to millions of image embeddings, making it effective for large-scale retrieval or re-ranking. Results show that this method consistently outperforms state-of-the-art alternatives, including those that require many orders of magnitude more computation at query time.
Problem

Research questions and friction points this paper is trying to address.

Improving retrieval performance in challenging large-scale tasks
Learning query-specific feature space transformations efficiently
Outperforming state-of-the-art methods with minimal computational cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns query-specific feature space transformation
Applies linear transformation to image embeddings
Improves large-scale retrieval performance efficiently
๐Ÿ”Ž Similar Papers
No similar papers found.