🤖 AI Summary
This paper addresses the angular similarity evaluation problem in high-dimensional Euclidean spaces. We propose two deterministic projection-based randomized kernel functions—specifically designed for angular comparison and angular thresholding—departing from conventional Gaussian random projections. Our approach introduces reference-angle-guided deterministic projections, eliminating reliance on asymptotic assumptions (e.g., infinitely many projections) while ensuring theoretical soundness and empirical superiority. The core methodology integrates probabilistic kernel design, angularly sensitive hashing, and approximate nearest neighbor search (ANNS) optimization. In ANNS benchmarks, our method achieves 2.5×–3× higher query throughput (QPS) compared to HNSW, significantly accelerating angular similarity retrieval. Theoretical analysis guarantees bounded approximation error, and extensive experiments validate robust performance across diverse high-dimensional datasets.
📝 Abstract
In this paper, we study the angle testing problem in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and employs a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be both theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We further apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5X ~ 3X higher query-per-second (QPS) throughput compared to the state-of-the-art graph-based search algorithm HNSW.