🤖 AI Summary
In high-dimensional settings (hundreds of dimensions), explicit computation of the k-nearest neighbor (k-NN) radius in the joint space—required for k-NN–based estimation of normalized mutual information—suffers from numerical overflow due to exponential amplification of Euclidean distances. This work proposes the first logarithmic-space transformation method specifically designed for k-NN radius computation: it relocates the distance exponentiation operation into the log domain, thereby eliminating overflow at the numerical analysis level. The method introduces no approximation or dimensionality reduction; we theoretically prove that its estimation bias matches and remains as controllable as that of standard k-NN estimators. Empirically, it completely avoids overflow on hundreds-of-dimension real and synthetic datasets while preserving mutual information estimation accuracy. This contribution establishes a stable, accurate, and plug-and-play numerical foundation for quantifying statistical dependence in high-dimensional spaces.
📝 Abstract
Mutual information provides a powerful, general-purpose metric for quantifying the amount of shared information between variables. Estimating normalized mutual information using a k-Nearest Neighbor (k-NN) based approach involves the calculation of the scaling-invariant k-NN radius. Calculation of the radius suffers from numerical overflow when the joint dimensionality of the data becomes high, typically in the range of several hundred dimensions. To address this issue, we propose a logarithmic transformation technique that improves the numerical stability of the radius calculation in high-dimensional spaces. By applying the proposed transformation during the calculation of the radius, numerical overflow is avoided, and precision is maintained. Proposed transformation is validated through both theoretical analysis and empirical evaluation, demonstrating its ability to stabilize the calculation without compromizing the precision of the results.