🤖 AI Summary
Existing language-assisted image clustering methods suffer from insufficient inter-class discriminability due to overly high textual feature similarity and are constrained by predefined image-text alignments, limiting the expressive potential of the textual modality. To address these limitations, this work proposes a novel approach that generates more discriminative self-supervised signals by modeling cross-modal relationships and introduces learnable, category-level continuous semantic centers via prompt-based learning to enhance both clustering performance and interpretability. By integrating vision-language models, cross-modal relational modeling, and self-supervised clustering, the proposed method achieves an average improvement of 2.6% over state-of-the-art techniques across eight benchmark datasets, while the learned semantic centers demonstrate strong semantic interpretability.
📝 Abstract
Language-Assisted Image Clustering (LAIC) augments the input images with additional texts with the help of vision-language models (VLMs) to promote clustering performance. Despite recent progress, existing LAIC methods often overlook two issues: (i) textual features constructed for each image are highly similar, leading to weak inter-class discriminability; (ii) the clustering step is restricted to pre-built image-text alignments, limiting the potential for better utilization of the text modality. To address these issues, we propose a new LAIC framework with two complementary components. First, we exploit cross-modal relations to produce more discriminative self-supervision signals for clustering, as it compatible with most VLMs training mechanisms. Second, we learn category-wise continuous semantic centers via prompt learning to produce the final clustering assignments. Extensive experiments on eight benchmark datasets demonstrate that our method achieves an average improvement of 2.6% over state-of-the-art methods, and the learned semantic centers exhibit strong interpretability. Code is available in the supplementary material.