🤖 AI Summary
In unsupervised visual representation learning, feature spaces often struggle to simultaneously achieve inter-class separation and intra-class compactness. To address this, we propose Cluster Contrast (CueCo), an end-to-end framework that jointly optimizes contrastive learning and clustering objectives. Its core innovation is a query-key network architecture, where a momentum-updated key encoder enables joint optimization of the contrastive loss and a clustering-driven compactness constraint—explicitly structuring the feature distribution. Implemented atop ResNet-18, CueCo requires no manual annotations or strong data augmentations. Under the standard linear evaluation protocol, it achieves 91.40%, 68.56%, and 78.65% top-1 accuracy on CIFAR-10, CIFAR-100, and ImageNet-100, respectively—outperforming state-of-the-art unsupervised methods. CueCo thus establishes a novel, interpretable, and structurally guided paradigm for unsupervised representation learning.
📝 Abstract
We introduce Cluster Contrast (CueCo), a novel approach to unsupervised visual representation learning that effectively combines the strengths of contrastive learning and clustering methods. Inspired by recent advancements, CueCo is designed to simultaneously scatter and align feature representations within the feature space. This method utilizes two neural networks, a query and a key, where the key network is updated through a slow-moving average of the query outputs. CueCo employs a contrastive loss to push dissimilar features apart, enhancing inter-class separation, and a clustering objective to pull together features of the same cluster, promoting intra-class compactness. Our method achieves 91.40% top-1 classification accuracy on CIFAR-10, 68.56% on CIFAR-100, and 78.65% on ImageNet-100 using linear evaluation with a ResNet-18 backbone. By integrating contrastive learning with clustering, CueCo sets a new direction for advancing unsupervised visual representation learning.