CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automatically determining the optimal number of clusters for high-dimensional, complex data—such as large-scale images—without any prior information. It proposes a novel method that circumvents assumptions about data distribution and does not require complete clustering results. The approach reformulates cluster number estimation as a dynamic comparison of positional relationships among cluster centers, introducing for the first time a sample confidence filtering mechanism to exclude low-confidence boundary samples. By integrating bipartite graph modeling with a pairwise center-matching strategy, the method achieves robust performance. Extensive experiments on challenging benchmarks, including CIFAR-10 and STL-10, demonstrate its significant superiority over current state-of-the-art techniques, highlighting enhanced robustness and adaptability.
📝 Abstract
One of the main challenges in data mining is choosing the optimal number of clusters without prior information. Notably, existing methods are usually in the philosophy of cluster validation and hence have underlying assumptions on data distribution, which prevents their application to complex data such as large-scale images and high-dimensional data from the real world. In this regard, we propose an approach named CNMBI. Leveraging the distribution information inherent in the data space, we map the target task as a dynamic comparison process between cluster centers regarding positional behavior, without relying on the complete clustering results and designing the complex validity index as before. Bipartite graph theory is then employed to efficiently model this process. Additionally, we find that different samples have different confidence levels and thereby actively remove low-confidence ones, which is, for the first time to our knowledge, considered in cluster number determination. CNMBI is robust and allows for more flexibility in the dimension and shape of the target data (e.g., CIFAR-10 and STL-10). Extensive comparison studies with state-of-the-art competitors on various challenging datasets demonstrate the superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

cluster number determination
unsupervised clustering
high-dimensional data
complex data
cluster validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cluster number determination
center pairwise matching
boundary filtering
bipartite graph modeling
confidence-aware clustering
🔎 Similar Papers
No similar papers found.
R
Ruilin Zhang
Harbin Institute of Technology, Shenzhen, Shenzhen, China
H
Haiyang Zheng
Harbin Institute of Technology, Shenzhen, Shenzhen, China
Hongpeng Wang
Hongpeng Wang
Robotic Institute, nankai university
Intelligent Robotics、Artificial Intelligence