Semantic Recall for Vector Search

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This work addresses a critical limitation in traditional approximate nearest neighbor (ANN) retrieval evaluation, which relies solely on recall and fails to distinguish semantically relevant from irrelevant neighbors, often misrepresenting retrieval quality. To overcome this, the authors introduce Semantic Recall—a novel metric that quantifies only those semantically relevant results theoretically recoverable via exact search—and propose Tolerant Recall as an efficient proxy for practical evaluation. This is the first systematic integration of semantic relevance into vector retrieval assessment, revealing the pervasive sparsity of relevant results within embedding spaces. Experimental results demonstrate that the proposed metrics more accurately reflect real-world retrieval effectiveness, and algorithms optimized under this framework achieve superior trade-offs between cost and quality.

Technology Category

Application Category

📝 Abstract
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.
Problem

Research questions and friction points this paper is trying to address.

Semantic Recall
Approximate Nearest Neighbor Search
Retrieval Quality
Embedding Datasets
Evaluation Metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Recall
Approximate Nearest Neighbor Search
Retrieval Quality
Tolerant Recall
Embedding Datasets
🔎 Similar Papers
No similar papers found.