PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters

📅 2026-01-06

🏛️ IEEE Signal Processing Letters

📈 Citations: 1

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the challenge of suboptimal separating hyperplanes and high clustering errors in deep unsupervised clustering when applied to class-imbalanced data. To mitigate these issues, we propose an enhanced TURTLE algorithm that integrates a power-law prior into the deep unsupervised SVM framework—a novel incorporation that, combined with sparse logits for label assignment, effectively alleviates the over-prediction of minority classes. Experimental results demonstrate that the proposed method significantly improves clustering accuracy on both synthetic and real-world imbalanced datasets. Moreover, it consistently outperforms existing approaches across both balanced and imbalanced scenarios, exhibiting strong robustness and generalization capability.

Technology Category

Application Category

📝 Abstract

Foundation vision, audio, and language models enable zero-shot performance on downstream tasks via their latent representations. Recently, unsupervised learning of data group structure with deep learning methods has gained popularity. TURTLE, a state of the art deep clustering algorithm, uncovers data labeling without supervision by alternating label and hyperplane updates, maximizing the hyperplane margin, in a similar fashion to support vector machines (SVMs). However, TURTLE assumes clusters are balanced; when data is imbalanced, it yields non-ideal hyperplanes that cause higher clustering error. We propose PET-TURTLE, which generalizes the cost function to handle imbalanced data distributions by a power law prior. Additionally, by introducing sparse logits in the labeling process, PET-TURTLE optimizes a simpler search space that in turn improves accuracy for balanced datasets. Experiments on synthetic and real data show that PET-TURTLE improves accuracy for imbalanced sources, prevents over-prediction of minority clusters, and enhances overall clustering.

Problem

Research questions and friction points this paper is trying to address.

imbalanced data

deep clustering

unsupervised learning

support vector machines

cluster imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

imbalanced data clustering

deep unsupervised learning

power law prior

sparse logits

support vector machines

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Software Engineer, Machine Learning