TDEC: Deep Embedded Image Clustering with Transformer and Distribution Information

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing deep image clustering methods, which often neglect global dependencies among image regions and struggle with complex scenarios due to high-dimensional features and simplistic distance metrics. To overcome these challenges, we propose TDEC, the first approach to integrate Transformers into a deep embedded clustering framework. TDEC employs a Transformer-based encoder to model global relationships, incorporates a learnable dimensionality reduction module to construct a clustering-friendly low-dimensional embedding space, and leverages feature distribution information to generate reliable self-supervisory signals. This enables end-to-end joint optimization of feature representation, dimensionality reduction, and cluster assignment. Extensive experiments demonstrate that TDEC significantly outperforms state-of-the-art methods across multiple challenging image datasets, exhibiting superior clustering performance and robust adaptability to variations in data scale, number of categories, and scene complexity.
📝 Abstract
Image clustering is a crucial but challenging task in multimedia machine learning. Recently the combination of clustering with deep learning has achieved promising performance against conventional methods on high-dimensional image data. Unfortunately, existing deep clustering methods (DC) often ignore the importance of information fusion with a global perception field among different image regions on clustering images, especially complex ones. Additionally, the learned features are usually clustering-unfriendly in terms of dimensionality and are based only on simple distance information for the clustering. In this regard, we propose a deep embedded image clustering TDEC, which for the first time to our knowledge, jointly considers feature representation, dimensional preference, and robust assignment for image clustering. Specifically, we introduce the Transformer to form a novel module T-Encoder to learn discriminative features with global dependency while using the Dim-Reduction block to build a friendly low-dimensional space favoring clustering. Moreover, the distribution information of embedded features is considered in the clustering process to provide reliable supervised signals for joint training. Our method is robust and allows for more flexibility in data size, the number of clusters, and the context complexity. More importantly, the clustering performance of TDEC is much higher than recent competitors. Extensive experiments with state-of-the-art approaches on complex datasets show the superiority of TDEC.
Problem

Research questions and friction points this paper is trying to address.

image clustering
deep clustering
feature representation
global perception
distribution information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer
Deep Embedded Clustering
Feature Distribution
Dimensionality Reduction
Global Dependency
🔎 Similar Papers
No similar papers found.
R
Ruilin Zhang
Harbin Institute of Technology, Shenzhen
H
Haiyang Zheng
Harbin Institute of Technology, Shenzhen
Hongpeng Wang
Hongpeng Wang
Robotic Institute, nankai university
Intelligent Robotics、Artificial Intelligence