🤖 AI Summary
Class imbalance (i.e., long-tailed distribution) in deep clustering degrades pseudo-label quality, hindering performance on real-world imbalanced data.
Method: We propose Semantic-Regularized Progressive Partial Optimal Transport (SPOT), the first framework to incorporate progressive partial optimal transport into deep clustering. SPOT explicitly models pseudo-label generation as a dynamic transport process jointly constrained by class prior distributions and semantic similarity. Using a Majorization-Minimization optimization strategy, we reformulate it as an enhanced unbalanced optimal transport problem, efficiently solved via a fast matrix scaling algorithm.
Results: SPOT achieves state-of-the-art performance on multiple long-tailed benchmarks—including long-tailed CIFAR-100, ImageNet-R, and iNaturalist2018—demonstrating superior robustness and generalization over existing deep clustering methods. Its principled integration of semantic regularization and partial transport enables effective handling of severe class imbalance across diverse scales.
📝 Abstract
Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we propose a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To address this challenge, we introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport (SP$^2$OT) problem, which progressively transports each sample to imbalanced clusters under several prior distribution and semantic relation constraints, thus generating high-quality and imbalance-aware pseudo-labels. To solve SP$^2$OT, we develop a Majorization-Minimization-based optimization algorithm. To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method.