ProHD: Projection-Based Hausdorff Distance Approximation

πŸ“… 2025-11-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Computing the exact Hausdorff distance (HD) for large-scale, high-dimensional data is computationally prohibitive, limiting its practical applicability. To address this, we propose ProHDβ€”a projection-based algorithm that jointly leverages centroid-aligned axes and principal component analysis (PCA) to select representative projection directions. These directions enable efficient identification of a small subset of extremal points, on which an approximate HD is computed. ProHD provides rigorous theoretical guarantees: its estimate is always a lower bound of the true HD, and the approximation error is bounded. On datasets with 2 million points in 256 dimensions, ProHD achieves 10–100Γ— speedup over exact algorithms and reduces estimation error by 5–20Γ— compared to random sampling. Its lightweight design supports integration into vector database retrieval systems and enables real-time processing of streaming data. Consequently, ProHD significantly enhances the scalability and practical utility of HD computation in real-world applications.

Technology Category

Application Category

πŸ“ Abstract
The Hausdorff distance (HD) is a robust measure of set dissimilarity, but computing it exactly on large, high-dimensional datasets is prohibitively expensive. We propose extbf{ProHD}, a projection-guided approximation algorithm that dramatically accelerates HD computation while maintaining high accuracy. ProHD identifies a small subset of candidate "extreme" points by projecting the data onto a few informative directions (such as the centroid axis and top principal components) and computing the HD on this subset. This approach guarantees an underestimate of the true HD with a bounded additive error and typically achieves results within a few percent of the exact value. In extensive experiments on image, physics, and synthetic datasets (up to two million points in $D=256$), ProHD runs 10--100$ imes$ faster than exact algorithms while attaining 5--20$ imes$ lower error than random sampling-based approximations. Our method enables practical HD calculations in scenarios like large vector databases and streaming data, where quick and reliable set distance estimation is needed.
Problem

Research questions and friction points this paper is trying to address.

Accelerates Hausdorff distance computation for large datasets
Approximates set dissimilarity with bounded additive error
Enables practical distance calculations in high-dimensional spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Projection-guided approximation algorithm for Hausdorff distance
Identifies extreme points using informative projection directions
Achieves high speedup with bounded additive error guarantee
πŸ”Ž Similar Papers
No similar papers found.