🤖 AI Summary
Image retrieval lacks statistical guarantees on result reliability, making it difficult to ensure—under quantifiable confidence—that retrieved results contain the true nearest neighbors. Method: We propose the first conformal retrieval framework with rigorous statistical coverage guarantees, requiring no model retraining or distributional assumptions and compatible with arbitrary embedding models and data distributions. Leveraging conformal prediction, our method calibrates distance uncertainty to construct confidence retrieval sets, optimizing retrieval efficiency while guaranteeing a user-specified coverage level (e.g., 90%). Results: Evaluated on CAR-196, CUB-200, Pittsburgh, and ChestX-Det, our approach strictly achieves target coverage and consistently outperforms baselines in both retrieval accuracy and efficiency. This work establishes the first verifiable, plug-and-play reliability assurance mechanism for image retrieval.
📝 Abstract
Most image retrieval research prioritizes improving predictive performance, often overlooking situations where the reliability of predictions is equally important. The gap between model performance and reliability requirements highlights the need for a systematic approach to analyze and address the risks associated with image retrieval. Uncertainty quantification technique can be applied to mitigate this issue by assessing uncertainty for retrieval sets, but it provides only a heuristic estimate of uncertainty rather than a guarantee. To address these limitations, we present Risk Controlled Image Retrieval (RCIR), which generates retrieval sets with coverage guarantee, i.e., retrieval sets that are guaranteed to contain the true nearest neighbors with a predefined probability. RCIR can be easily integrated with existing uncertainty-aware image retrieval systems, agnostic to data distribution and model selection. To the best of our knowledge, this is the first work that provides coverage guarantees to image retrieval. The validity and efficiency of RCIR are demonstrated on four real-world datasets: CAR-196, CUB-200, Pittsburgh, and ChestX-Det.