Normalized mutual information is a biased measure for classification and community detection

📅 2023-07-03

🏛️ Nature Communications

📈 Citations: 12

✨ Influential: 3

career value

243K/year

🤖 AI Summary

Normalized Mutual Information (NMI) suffers from two fundamental biases in clustering and community detection evaluation: (i) it disregards the intrinsic information structure of the contingency table, and (ii) its symmetric normalization induces spurious dependence on algorithmic output distributions. Method: We systematically uncover NMI’s information-theoretic origins and propose Unbiased Mutual Information (UB-MI), the first mutual information measure that simultaneously eliminates both biases. UB-MI reformulates normalization from first principles of information theory, preserving the full statistical semantics of the contingency table. Contribution/Results: On multi-algorithm community detection benchmarks, NMI severely distorts performance rankings—reducing average rank correlation by up to 0.32—whereas UB-MI significantly improves evaluation robustness, interpretability, and alignment with ground-truth structure, increasing consistency by 17.6%. UB-MI establishes a theoretically sounder foundation for unsupervised evaluation.

📝 Abstract

Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one’s conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.

Problem

Research questions and friction points this paper is trying to address.

Normalized mutual information has bias issues

The paper introduces an unbiased mutual information measure

Biases affect algorithm performance evaluation in clustering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified mutual information corrects normalization bias

Addresses contingency table information content omission

Removes spurious dependence from symmetric normalization

🔎 Similar Papers

Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions