🤖 AI Summary
This paper addresses balanced $k$-means clustering—improving clustering quality under the constraint that clusters have equal (or nearly equal) cardinalities. To this end, we propose BalLOT, an alternating minimization algorithm grounded in optimal transport (OT). BalLOT is the first to formulate balanced clustering via OT, enforcing integrality of the coupling matrix at every iteration to guarantee feasibility. Theoretically, we analyze its optimization landscape under the stochastic ball model, establishing guarantees for exact and partial cluster recovery; we further design an initialization scheme enabling one-step exact recovery of ground-truth clusters. BalLOT converges to integer solutions and achieves high-probability exact recovery on generic datasets. Experiments demonstrate its efficiency and robustness across diverse benchmarks.
📝 Abstract
We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.