PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters

πŸ“… 2026-01-06
πŸ›οΈ IEEE Signal Processing Letters
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of suboptimal separating hyperplanes and high clustering errors in deep unsupervised clustering when applied to class-imbalanced data. To mitigate these issues, we propose an enhanced TURTLE algorithm that integrates a power-law prior into the deep unsupervised SVM frameworkβ€”a novel incorporation that, combined with sparse logits for label assignment, effectively alleviates the over-prediction of minority classes. Experimental results demonstrate that the proposed method significantly improves clustering accuracy on both synthetic and real-world imbalanced datasets. Moreover, it consistently outperforms existing approaches across both balanced and imbalanced scenarios, exhibiting strong robustness and generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Foundation vision, audio, and language models enable zero-shot performance on downstream tasks via their latent representations. Recently, unsupervised learning of data group structure with deep learning methods has gained popularity. TURTLE, a state of the art deep clustering algorithm, uncovers data labeling without supervision by alternating label and hyperplane updates, maximizing the hyperplane margin, in a similar fashion to support vector machines (SVMs). However, TURTLE assumes clusters are balanced; when data is imbalanced, it yields non-ideal hyperplanes that cause higher clustering error. We propose PET-TURTLE, which generalizes the cost function to handle imbalanced data distributions by a power law prior. Additionally, by introducing sparse logits in the labeling process, PET-TURTLE optimizes a simpler search space that in turn improves accuracy for balanced datasets. Experiments on synthetic and real data show that PET-TURTLE improves accuracy for imbalanced sources, prevents over-prediction of minority clusters, and enhances overall clustering.
Problem

Research questions and friction points this paper is trying to address.

imbalanced data
deep clustering
unsupervised learning
support vector machines
cluster imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

imbalanced data clustering
deep unsupervised learning
power law prior
sparse logits
support vector machines
πŸ”Ž Similar Papers
No similar papers found.