🤖 AI Summary
Unsupervised image classification suffers from insufficient synergy between representation learning and clustering, with existing methods often neglecting feature diversity under frozen backbone models. This paper proposes a multi-head clustering framework featuring three key innovations: (1) adaptive nearest-neighbor selection to dynamically optimize similarity metrics; (2) a clustering ensemble mechanism that resolves inter-head conflicts to generate robust consensus pseudo-labels; and (3) a lightweight classifier training paradigm guided solely by pseudo-labels. Crucially, the method requires no fine-tuning of the backbone network. It achieves, for the first time in fully unsupervised settings, 70.4% top-1 accuracy on ImageNet, and 99.3% and 89.0% on CIFAR-10 and CIFAR-100, respectively. The approach outperforms prior work across all ten benchmark datasets, substantially narrowing the performance gap with supervised counterparts.
📝 Abstract
Unsupervised image classification, or image clustering, aims to group unlabeled images into semantically meaningful categories. Early methods integrated representation learning and clustering within an iterative framework. However, the rise of foundational models have recently shifted focus solely to clustering, bypassing the representation learning step. In this work, we build upon a recent multi-head clustering approach by introducing adaptive nearest neighbor selection and cluster ensembling strategies to improve clustering performance. Our method, "Image Clustering through Cluster Ensembles" (ICCE), begins with a clustering stage, where we train multiple clustering heads on a frozen backbone, producing diverse image clusterings. We then employ a cluster ensembling technique to consolidate these potentially conflicting results into a unified consensus clustering. Finally, we train an image classifier using the consensus clustering result as pseudo-labels. ICCE achieves state-of-the-art performance on ten image classification benchmarks, achieving 99.3% accuracy on CIFAR10, 89% on CIFAR100, and 70.4% on ImageNet datasets, narrowing the performance gap with supervised methods. To the best of our knowledge, ICCE is the first fully unsupervised image classification method to exceed 70% accuracy on ImageNet.