A Universal Nearest-Neighbor Estimator for Intrinsic Dimensionality

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating the intrinsic dimensionality of high-dimensional data is crucial for machine learning and computer vision, yet existing methods often fail due to reliance on specific geometric or distributional assumptions. This work proposes a nonparametric estimator based on nearest-neighbor distance ratios that requires no prior assumptions about the underlying data manifold or distribution. For the first time, it is theoretically proven that this estimator consistently converges to the true intrinsic dimension under arbitrary data distributions. Extensive experiments demonstrate that the method achieves state-of-the-art performance on both synthetic manifolds and real-world datasets, exhibiting high accuracy, strong robustness, and broad applicability.

Technology Category

Application Category

📝 Abstract
Estimating the intrinsic dimensionality (ID) of data is a fundamental problem in machine learning and computer vision, providing insight into the true degrees of freedom underlying high-dimensional observations. Existing methods often rely on geometric or distributional assumptions and can significantly fail when these assumptions are violated. In this paper, we introduce a novel ID estimator based on nearest-neighbor distance ratios that involves simple calculations and achieves state-of-the-art results. Most importantly, we provide a theoretical analysis proving that our estimator is \emph{universal}, namely, it converges to the true ID independently of the distribution generating the data. We present experimental results on benchmark manifolds and real-world datasets to demonstrate the performance of our estimator.
Problem

Research questions and friction points this paper is trying to address.

intrinsic dimensionality
nearest-neighbor
dimensionality estimation
machine learning
computer vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

intrinsic dimensionality
nearest-neighbor
universal estimator
distance ratios
distribution-free