SP2OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

📅 2024-04-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Class imbalance (i.e., long-tailed distribution) in deep clustering degrades pseudo-label quality, hindering performance on real-world imbalanced data. Method: We propose Semantic-Regularized Progressive Partial Optimal Transport (SPOT), the first framework to incorporate progressive partial optimal transport into deep clustering. SPOT explicitly models pseudo-label generation as a dynamic transport process jointly constrained by class prior distributions and semantic similarity. Using a Majorization-Minimization optimization strategy, we reformulate it as an enhanced unbalanced optimal transport problem, efficiently solved via a fast matrix scaling algorithm. Results: SPOT achieves state-of-the-art performance on multiple long-tailed benchmarks—including long-tailed CIFAR-100, ImageNet-R, and iNaturalist2018—demonstrating superior robustness and generalization over existing deep clustering methods. Its principled integration of semantic regularization and partial transport enables effective handling of severe class imbalance across diverse scales.

Technology Category

Application Category

📝 Abstract

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we propose a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To address this challenge, we introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport (SP$^2$OT) problem, which progressively transports each sample to imbalanced clusters under several prior distribution and semantic relation constraints, thus generating high-quality and imbalance-aware pseudo-labels. To solve SP$^2$OT, we develop a Majorization-Minimization-based optimization algorithm. To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method.

Problem

Research questions and friction points this paper is trying to address.

Addresses deep imbalanced clustering with uneven class distribution

Proposes optimal transport-based pseudo-label learning framework SP2OT

Solves SP2OT via projected mirror descent algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-regularized Progressive Partial Optimal Transport

Projected mirror descent algorithm for SP2OT

Unbalanced optimal transport with augmented constraints

🔎 Similar Papers

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

2024-02-28AAAI Conference on Artificial IntelligenceCitations: 1

Bosch Group

Renningen, BW, DE

PhD - Effiziente Neuronale Repräsentation von Datensätzen

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)