Effective and General Distance Computation for Approximate Nearest Neighbor Search

📅 2024-04-25

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

To address the high computational cost, low accuracy, and poor generalizability of distance computation in high-dimensional approximate k-nearest neighbor (AKNN) search, this paper proposes a data-distribution-aware orthogonal projection distance estimation method with a decoupled, data-driven correction scheme. It is the first to incorporate explicit data distribution modeling into orthogonal projection-based distance estimation and fully decouples distance approximation from correction—thereby jointly optimizing efficiency, accuracy, and generality. The approach integrates orthogonal projection for dimensionality reduction, a lightweight data-driven correction model, high-dimensional index optimization, and accelerated distance computation mechanisms. Extensive experiments on multiple real-world datasets demonstrate that our method achieves 1.6–2.1× higher retrieval speed than ADSampling, while significantly improving recall and distance estimation accuracy.

Technology Category

Application Category

📝 Abstract

Approximate K Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. In AKNN search, distance computation is the core task that dominates the runtime. Existing approaches typically use approximate distances to improve computational efficiency, often at the cost of reduced search accuracy. To address this issue, the state-of-the-art method, ADSampling, employs random projections to estimate approximate distances and introduces an additional distance correction process to mitigate accuracy loss. However, ADSampling has limitations in both effectiveness and generality, primarily due to its reliance on random projections for distance approximation and correction. To address the effectiveness limitations of ADSampling, we leverage data distribution to improve distance computation via orthogonal projection. Furthermore, to overcome the generality limitations of ADSampling, we adopt a data-driven approach to distance correction, decoupling the correction process from the distance approximation process. Extensive experiments demonstrate the superiority and effectiveness of our method. In particular, compared to ADSampling, our method achieves a speedup of 1.6 to 2.1 times on real-world datasets while providing higher accuracy.

Problem

Research questions and friction points this paper is trying to address.

High-dimensional Space

Approximate Nearest Neighbor Search

Efficiency and Accuracy Balance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved AKNN

Data-driven Strategy

Orthogonal Projection

🔎 Similar Papers

Dimensionality-Reduction Techniques for Approximate Nearest Neighbor Search: A Survey and Evaluation