🤖 AI Summary
This work addresses the limitations of K-means clustering when applied to nonlinearly separable data, where existing post-hoc merging strategies often suffer from computational inefficiency and reliance on hyperparameters. To overcome these issues, the authors propose CavMerge, a parameter-free and computationally efficient algorithm that merges clusters obtained from K-means with an initially large K. CavMerge leverages a local log-concavity assumption and integrates statistical consistency theory with an efficient post-processing mechanism, ensuring strong consistency and rapid convergence under mild distributional assumptions. Extensive experiments on both synthetic and real-world datasets demonstrate that CavMerge consistently outperforms state-of-the-art methods, yielding more stable and reliable clustering results.
📝 Abstract
K-means clustering, a classic and widely-used clustering technique, is known to exhibit suboptimal performance when applied to non-linearly separable data. Numerous adjustments and modifications have been proposed to address this issue, including methods that merge K-means results from a relatively large K to obtain a final cluster assignment. However, existing methods of this nature often encounter computational inefficiencies and suffer from hyperparameter tuning. Here we present \emph{CavMerge}, a novel K-means merging algorithm that is intuitive, free of parameter tuning, and computationally efficient. Operating under minimal local distributional assumptions, our algorithm demonstrates strong consistency and rapid convergence guarantees. Empirical studies on various simulated and real datasets demonstrate that our method yields more reliable clusters in comparison to current state-of-the-art algorithms.