Fast Kd-trees for the Kullback--Leibler Divergence and other Decomposable Bregman Divergences

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the inefficiency of nearest-neighbor search in non-Euclidean spaces under Bregman divergences—particularly Kullback–Leibler divergence. We propose the first efficient k-d tree framework supporting arbitrary decomposable Bregman divergences. Our key insight is the formal proof and exploitation of the fact that k-d tree pruning does not require the triangle inequality, thereby breaking free from traditional metric-space constraints. Leveraging coordinate-wise decomposability of Bregman divergences, we redesign the tree construction and traversal procedures, introducing divergence-specific distance computation and pruning strategies, implemented in C++. Experiments show over 100× speedup versus brute-force search for dimensions ≤ 100, with both exact and approximate queries outperforming state-of-the-art methods. The framework provides a scalable, high-accuracy nearest-neighbor search paradigm for canonical Bregman spaces, such as probability vector spaces.

Technology Category

Application Category

📝 Abstract

The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbour search for decomposable Bregman divergences. The implementation supports the Kullback--Leibler divergence (relative entropy) which is a popular distance between probability vectors and is commonly used in statistics and machine learning. This is a step toward broadening the usage of computational geometry algorithms. Our benchmarks show that our implementation efficiently handles both exact and approximate nearest neighbour queries. Compared to a naive approach, we achieve two orders of magnitude speedup for practical scenarios in dimension up to 100. Our solution is simpler and more efficient than competing methods.

Problem

Research questions and friction points this paper is trying to address.

Extending Kd-trees for Bregman divergences

Efficient nearest neighbour search algorithm

Handling Kullback-Leibler divergence in computations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Kd-trees for Bregman divergences

Efficient nearest neighbor search algorithm

Simpler and faster than competing methods

🔎 Similar Papers

Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions