Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions

📅 2024-10-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-dimensional settings (hundreds of dimensions), explicit computation of the k-nearest neighbor (k-NN) radius in the joint space—required for k-NN–based estimation of normalized mutual information—suffers from numerical overflow due to exponential amplification of Euclidean distances. This work proposes the first logarithmic-space transformation method specifically designed for k-NN radius computation: it relocates the distance exponentiation operation into the log domain, thereby eliminating overflow at the numerical analysis level. The method introduces no approximation or dimensionality reduction; we theoretically prove that its estimation bias matches and remains as controllable as that of standard k-NN estimators. Empirically, it completely avoids overflow on hundreds-of-dimension real and synthetic datasets while preserving mutual information estimation accuracy. This contribution establishes a stable, accurate, and plug-and-play numerical foundation for quantifying statistical dependence in high-dimensional spaces.

Technology Category

Application Category

📝 Abstract
Mutual information provides a powerful, general-purpose metric for quantifying the amount of shared information between variables. Estimating normalized mutual information using a k-Nearest Neighbor (k-NN) based approach involves the calculation of the scaling-invariant k-NN radius. Calculation of the radius suffers from numerical overflow when the joint dimensionality of the data becomes high, typically in the range of several hundred dimensions. To address this issue, we propose a logarithmic transformation technique that improves the numerical stability of the radius calculation in high-dimensional spaces. By applying the proposed transformation during the calculation of the radius, numerical overflow is avoided, and precision is maintained. Proposed transformation is validated through both theoretical analysis and empirical evaluation, demonstrating its ability to stabilize the calculation without compromizing the precision of the results.
Problem

Research questions and friction points this paper is trying to address.

Prevent numerical overflow in high-dimensional k-NN radius calculation
Enhance stability of normalized mutual information estimators
Maintain precision in mutual information estimation without computational overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Logarithmic transformation for numerical stability
Avoids overflow in high-dimensional radius calculation
Maintains precision without computational overhead
🔎 Similar Papers
No similar papers found.
M
Marko Tuononen
Nokia Networks, Karaportti 7, 02610 Espoo, Finland and School of Computing, University of Eastern Finland, P.O. Box 111, 80101 Joensuu, Finland
Ville Hautamäki
Ville Hautamäki
Associate Professor, University of Eastern Finland
Speaker recognitionlanguage recognitionmachine learningcomputational biology