🤖 AI Summary
Existing clustering methods (e.g., K-means, DBSCAN) struggle with multimodal, non-convex, nested, and noisy data, often relying on predefined parameters or global structural assumptions. To address these limitations, we propose Deep-driven Local Centroid Clustering (DLCC). DLCC introduces the novel concept of *local data depth*, overcoming the inadequacy of global depth in characterizing multimodal structures; designs a density-sensitive intra-cluster validity metric to assess internal quality of non-convex clusters; and incorporates subset-based depth ranking with a parameter-free, adaptive mechanism for local centroid identification. Extensive experiments demonstrate that DLCC significantly outperforms state-of-the-art methods on datasets featuring diverse shapes, non-convexity, nesting, and noise. Crucially, DLCC requires no prior specification of the number of clusters, exhibits strong generalizability, and maintains robustness and practical utility across heterogeneous scenarios.
📝 Abstract
Cluster analysis, or clustering, plays a crucial role across numerous scientific and engineering domains. Despite the wealth of clustering methods proposed over the past decades, each method is typically designed for specific scenarios and presents certain limitations in practical applications. In this paper, we propose depth-based local center clustering (DLCC). This novel method makes use of data depth, which is known to produce a center-outward ordering of sample points in a multivariate space. However, data depth typically fails to capture the multimodal characteristics of {data}, something of the utmost importance in the context of clustering. To overcome this, DLCC makes use of a local version of data depth that is based on subsets of {data}. From this, local centers can be identified as well as clusters of varying shapes. Furthermore, we propose a new internal metric based on density-based clustering to evaluate clustering performance on {non-convex clusters}. Overall, DLCC is a flexible clustering approach that seems to overcome some limitations of traditional clustering methods, thereby enhancing data analysis capabilities across a wide range of application scenarios.