Hierarchical Single-Linkage Clustering for Community Detection with Overlaps and Outliers

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional graph clustering methods suffer from restrictive assumptions—such as hard node assignments and the absence of outliers—limiting their applicability to real-world graphs. To address this, we propose a general graph clustering framework capable of detecting overlapping communities and identifying noise nodes simultaneously. Our method innovatively adapts the core principles of HDBSCAN to graph-structured data: it reconstructs adjacency weights by integrating multiple node- and edge-level similarity measures while enhancing robustness to outliers, and employs a hierarchical single-linkage clustering strategy for joint community and outlier discovery. Crucially, the framework requires no prior specification of the number of communities and naturally accommodates heterogeneous graph structures. Extensive experiments on synthetic and real-world benchmarks demonstrate its robustness and competitive performance in both overlapping community detection and noise identification, significantly broadening the modeling capacity and practical utility of graph clustering.

Technology Category

Application Category

📝 Abstract
Most community detection approaches make very strong assumptions about communities in the data, such as every vertex must belong to exactly one community (the communities form a partition). For vector data, Hierarchical Density Based Spatial Clustering for Applications with Noise (HDBSCAN) has emerged as a leading clustering algorithm that allows for outlier points that do not belong to any cluster. The first step in HDBSCAN is to redefine the distance between vectors in such a way that single-linkage clustering is effective and robust to noise. Many community detection algorithms start with a similar step that attempts to increase the weight of edges between similar nodes and decrease weights of noisy edges. In this paper, we apply the hierarchical single-linkage clustering algorithm from HDBSCAN to a variety of node/edge similarity scores to see if there is an algorithm that can effectively detect clusters while allowing for outliers. In experiments on synthetic and real world data sets, we find that no single method is optimal for every type of graph, but the admirable performance indicates that hierarchical single-linkage clustering is a viable paradigm for graph clustering.
Problem

Research questions and friction points this paper is trying to address.

Detecting overlapping communities and outliers in graphs
Evaluating hierarchical single-linkage clustering on similarity scores
Assessing algorithm performance across diverse graph types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applies HDBSCAN hierarchical single-linkage to graphs
Uses node/edge similarity scores for clustering
Allows outlier detection in community identification
🔎 Similar Papers