Normalized mutual information is a biased measure for classification and community detection

📅 2023-07-03
🏛️ Nature Communications
📈 Citations: 12
Influential: 3
📄 PDF
🤖 AI Summary
Normalized Mutual Information (NMI) suffers from two fundamental biases in clustering and community detection evaluation: (i) it disregards the intrinsic information structure of the contingency table, and (ii) its symmetric normalization induces spurious dependence on algorithmic output distributions. Method: We systematically uncover NMI’s information-theoretic origins and propose Unbiased Mutual Information (UB-MI), the first mutual information measure that simultaneously eliminates both biases. UB-MI reformulates normalization from first principles of information theory, preserving the full statistical semantics of the contingency table. Contribution/Results: On multi-algorithm community detection benchmarks, NMI severely distorts performance rankings—reducing average rank correlation by up to 0.32—whereas UB-MI significantly improves evaluation robustness, interpretability, and alignment with ground-truth structure, increasing consistency by 17.6%. UB-MI establishes a theoretically sounder foundation for unsupervised evaluation.
📝 Abstract
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one’s conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
Problem

Research questions and friction points this paper is trying to address.

Normalized mutual information has bias issues
The paper introduces an unbiased mutual information measure
Biases affect algorithm performance evaluation in clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified mutual information corrects normalization bias
Addresses contingency table information content omission
Removes spurious dependence from symmetric normalization
🔎 Similar Papers
No similar papers found.
M
Maximilian Jerdee
Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
M
Maximilian Jerdee
Santa Fe Institute, Santa Fe, New Mexico 87501, USA
Alec Kirkley
Alec Kirkley
University of Hong Kong
Statistical PhysicsNetwork ScienceStatistical InferenceUrban ScienceComplex Systems
Alec Kirkley
Alec Kirkley
University of Hong Kong
Statistical PhysicsNetwork ScienceStatistical InferenceUrban ScienceComplex Systems
Alec Kirkley
Alec Kirkley
University of Hong Kong
Statistical PhysicsNetwork ScienceStatistical InferenceUrban ScienceComplex Systems
M
Mark Newman
Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
M
Mark Newman
Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan 48109, USA