BalLOT: Balanced $k$-means clustering with optimal transport

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses balanced $k$-means clustering—improving clustering quality under the constraint that clusters have equal (or nearly equal) cardinalities. To this end, we propose BalLOT, an alternating minimization algorithm grounded in optimal transport (OT). BalLOT is the first to formulate balanced clustering via OT, enforcing integrality of the coupling matrix at every iteration to guarantee feasibility. Theoretically, we analyze its optimization landscape under the stochastic ball model, establishing guarantees for exact and partial cluster recovery; we further design an initialization scheme enabling one-step exact recovery of ground-truth clusters. BalLOT converges to integer solutions and achieves high-probability exact recovery on generic datasets. Experiments demonstrate its efficiency and robustness across diverse benchmarks.

Technology Category

Application Category

📝 Abstract
We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.
Problem

Research questions and friction points this paper is trying to address.

Balanced k-means clustering via optimal transport
Fast solution with theoretical recovery guarantees
Integral couplings and initialization for cluster recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal transport for balanced k-means clustering
Integral couplings for generic data solutions
One-step recovery via specialized initialization schemes
🔎 Similar Papers
No similar papers found.