Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the practical efficacy and applicability boundaries of distance comparison operations (DCOs) in vector similarity search, addressing the critical question of whether DCOs are suitable for production-grade vector databases. Through the first comprehensive benchmark of eight DCO algorithms across diverse hardware platforms—including CPU (with and without SIMD) and GPU—on ten billion-scale datasets with dimensions up to 12,288, the work reveals that DCO performance is highly sensitive to data dimensionality and hardware configuration. Notably, under out-of-distribution queries, certain DCOs even underperform full-dimensional scanning, indicating they currently fall short of production deployment requirements. Nevertheless, the study identifies promising potential for DCOs to accelerate index construction and data updates.
📝 Abstract
Distance Comparison Operations (DCOs), which decide whether the distance between a data vector and a query is within a threshold, are a critical performance bottleneck in vector similarity search. Recent DCO methods that avoid full-dimensional distance computations promise significant speedups, but their readiness for production vector database systems remains an open question. To address this, we conduct a comprehensive benchmark of 8 DCO algorithms across 10 datasets (with up to 100M vectors and 12,288 dimensions) and diverse hardware configurations (CPUs with/without SIMD, and GPUs). Our study reveals that these methods are not silver bullets: their efficiency is highly sensitive to data dimensionality, degrades under out-of-distribution queries, and is unstable across hardware. Yet, our evaluation also demonstrates often-overlooked merits: they can accelerate index construction and data updates. Despite these benefits, their unstable performance, which can be slower than a full-dimensional scan, leads us to conclude that recent algorithmic advancements in DCO are not yet ready for production deployment.
Problem

Research questions and friction points this paper is trying to address.

Distance Comparison Operations
Vector Similarity Search
Performance Bottleneck
Production Readiness
High-Dimensional Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distance Comparison Operations
Vector Similarity Search
Benchmark Study
High-Dimensional Data
Hardware-Aware Evaluation
Z
Zhuanglin Zheng
SKLCCSE Lab, BDBC and IRI, Beihang University, Beijing, China
Yuxiang Zeng
Yuxiang Zeng
Beihang University
Vector DatabasesFederated DatabasesSpatial Data Analytics
Chenchen Liu
Chenchen Liu
University of Maryland, Baltimore County
High-Performance ComputingDeep LearningBrain-Inspired ComputingEmerging Memory Technologies
Y
Yunzhen Chi
SKLCCSE Lab, BDBC and IRI, Beihang University, Beijing, China
B
Binhan Yang
SKLCCSE Lab, BDBC and IRI, Beihang University, Beijing, China
Y
Yongxin Tong
SKLCCSE Lab, BDBC and IRI, Beihang University, Beijing, China