🤖 AI Summary
This paper studies the connectivity-constrained $k$-median problem: given a point set in a metric space and an unweighted, connected graph $G$ on the same vertex set, partition the points into $k$ clusters such that each cluster induces a connected subgraph in $G$, while minimizing the $k$-median objective. We introduce the first model supporting cluster overlap and design an $O(k^2 log n)$-approximation algorithm. We prove that the non-overlapping variant is $Omega(n^{1-varepsilon})$-hard to approximate on general graphs, assuming $ ext{P}
eq ext{NP}$. For tree-structured connectivity graphs, we provide a polynomial-time exact algorithm. Our key contributions lie in unifying overlapping and non-overlapping settings under a single framework, integrating graph connectivity analysis, decomposition-and-rounding techniques, and combinatorial optimization—thereby significantly advancing the theoretical foundations and algorithmic design for constrained clustering.
📝 Abstract
The connected $k$-median problem is a constrained clustering problem that combines distance-based $k$-clustering with connectivity information. The problem allows to input a metric space and an unweighted undirected connectivity graph that is completely unrelated to the metric space. The goal is to compute $k$ centers and corresponding clusters such that each cluster forms a connected subgraph of $G$, and such that the $k$-median cost is minimized.
The problem has applications in very different fields like geodesy (particularly districting), social network analysis (especially community detection), or bioinformatics. We study a version with overlapping clusters where points can be part of multiple clusters which is natural for the use case of community detection. This problem variant is $Ω(log n)$-hard to approximate, and our main result is an $mathcal{O}(k^2 log n)$-approximation algorithm for the problem. We complement it with an $Ω(n^{1-ε})$-hardness result for the case of disjoint clusters without overlap with general connectivity graphs, as well as an exact algorithm in this setting if the connectivity graph is a tree.