🤖 AI Summary
This work systematically investigates the lack of bit-level reproducibility in K-Means, Ward, and DBSCAN clustering algorithms under multithreaded execution. To isolate sources of non-determinism, we decouple each algorithm’s computational stages and analyze their implementations in scikit-learn alongside OpenMP’s numerical determinism guarantees. We identify, for the first time, that K-Means fails to reproduce bit-identical results when using more than two OpenMP threads—due to non-deterministic reduction ordering in parallel distance computations. In contrast, Ward and DBSCAN exhibit higher reproducibility under default configurations. Based on these findings, we propose a stage-wise reproducibility assessment framework for clustering algorithms and derive practical, science-oriented guidelines for achieving reproducible clustering. Our study advances awareness of reproducibility risks in foundational machine learning algorithms and provides both theoretical insights and engineering evidence to support robust algorithm design and trustworthy AI development.
📝 Abstract
Reproducibility is essential in machine learning because it ensures that a model or experiment yields the same scientific conclusion. For specific algorithms repeatability with bitwise identical results is also a key for scientific integrity because it allows debugging. We decomposed several very popular clustering algorithms: K-Means, DBSCAN and Ward into their fundamental steps, and we identify the conditions required to achieve repeatability at each stage. We use an implementation example with the Python library scikit-learn to examine the repeatable aspects of each method. Our results reveal inconsistent results with K-Means when the number of OpenMP threads exceeds two. This work aims to raise awareness of this issue among both users and developers, encouraging further investigation and potential fixes.