Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of cardinality estimation for similarity queries in high-dimensional spaces by proposing a novel method that balances accuracy and online efficiency. The approach leverages locality-sensitive hashing (LSH) to partition the space and integrates adaptive multi-probe bucket probing, progressive sampling, and asymmetric distance computation. It also supports dynamic data updates, making it suitable for evolving datasets. Experimental results demonstrate that the proposed scheme significantly outperforms existing methods across multiple high-dimensional datasets, achieving high estimation accuracy while substantially improving online query response time. The method is thus well-suited for large-scale applications involving both static and dynamic data.
📝 Abstract
In this work, we address the problem of cardinality estimation for similarity search in high-dimensional spaces. Our goal is to design a framework that is lightweight, easy to construct, and capable of providing accurate estimates with satisfying online efficiency. We leverage locality-sensitive hashing (LSH) to partition the vector space while preserving distance proximity. Building on this, we adopt the principles of classical multi-probe LSH to adaptively explore neighboring buckets, accounting for distance thresholds of varying magnitudes. To improve online efficiency, we employ progressive sampling to reduce the number of distance computations and utilize asymmetric distance computation in product quantization to accelerate distance calculations in high-dimensional spaces. In addition to handling static datasets, our framework includes updating algorithm designed to efficiently support large-scale dynamic scenarios of data updates.Experiments demonstrate that our methods can accurately estimate the cardinality of similarity queries, yielding satisfying efficiency.
Problem

Research questions and friction points this paper is trying to address.

cardinality estimation
similarity search
high-dimensional spaces
adaptive bucket probing
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive bucket probing
cardinality estimation
locality-sensitive hashing
progressive sampling
asymmetric distance computation
🔎 Similar Papers
No similar papers found.