eDCF: Estimating Intrinsic Dimension using Local Connectivity

πŸ“… 2025-10-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Estimating the intrinsic dimension (ID) of high-dimensional data is highly sensitive to noise and scale selection: fine scales lead to overestimation due to noise, while coarse scales underestimate true complexity. To address this, we propose a multi-scale ID estimation method grounded in local connectivity, introducing the Connectivity Factor (CF) as a robust statistical measure. Our approach integrates sliding-window analysis with parallelization for scalable computation. It achieves a balanced trade-off among noise robustness, scale adaptivity, and computational efficiency. On synthetic benchmarks, our method attains mean absolute error (MAE) comparable to state-of-the-art approaches, achieves the highest exact-match rateβ€”up to 25.0%β€”and significantly outperforms both the Maximum Likelihood Estimator (MLE) and TWO-NN. Moreover, it accurately captures the fractal structure of decision boundaries, demonstrating superior geometric fidelity in complex manifold learning scenarios.

Technology Category

Application Category

πŸ“ Abstract
Modern datasets often contain high-dimensional features exhibiting complex dependencies. To effectively analyze such data, dimensionality reduction methods rely on estimating the dataset's intrinsic dimension (id) as a measure of its underlying complexity. However, estimating id is challenging due to its dependence on scale: at very fine scales, noise inflates id estimates, while at coarser scales, estimates stabilize to lower, scale-invariant values. This paper introduces a novel, scalable, and parallelizable method called eDCF, which is based on Connectivity Factor (CF), a local connectivity-based metric, to robustly estimate intrinsic dimension across varying scales. Our method consistently matches leading estimators, achieving comparable values of mean absolute error (MAE) on synthetic benchmarks with noisy samples. Moreover, our approach also attains higher exact intrinsic dimension match rates, reaching up to 25.0% compared to 16.7% for MLE and 12.5% for TWO-NN, particularly excelling under medium to high noise levels and large datasets. Further, we showcase our method's ability to accurately detect fractal geometries in decision boundaries, confirming its utility for analyzing realistic, structured data.
Problem

Research questions and friction points this paper is trying to address.

Estimating intrinsic dimension in high-dimensional noisy datasets
Addressing scale-dependent challenges in dimensionality reduction
Detecting fractal geometries in complex data structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

eDCF uses local connectivity for dimension estimation
Method robustly estimates dimension across varying scales
Achieves higher exact match rates under noise
πŸ”Ž Similar Papers
No similar papers found.