🤖 AI Summary
This paper studies Correlation Clustering (CC) with hard constraints—must-link and cannot-link pairwise preferences—with the objective of minimizing the number of violated constraints; it further supports node weights to reflect point importance. As this constrained CC variant is APX-hard, we propose the first near-linear-time (Õ(n³)) 16-approximation algorithm, improving upon prior algorithms with Ω(n^{3ω}) runtime. We also establish, for the first time, a deterministic 3-approximation guarantee for the CC-PIVOT algorithm and prove its approximation ratio is tight—i.e., optimal. Our approach integrates combinatorial optimization, pivot-based recursive partitioning, constraint propagation, and weighted greedy selection. Key contributions include: (1) the first efficient approximation framework for hard-constrained, weighted CC; (2) tight approximation-ratio analysis confirming optimality of the 3-approximation for CC-PIVOT; and (3) the first 3-approximation algorithm for weighted CC under must-link/cannot-link constraints.
📝 Abstract
In the Correlation Clustering problem we are given $n$ nodes, and a preference for each pair of nodes indicating whether we prefer the two endpoints to be in the same cluster or not. The output is a clustering inducing the minimum number of violated preferences. In certain cases, however, the preference between some pairs may be too important to be violated. The constrained version of this problem specifies pairs of nodes that must be in the same cluster as well as pairs that must not be in the same cluster (hard constraints). The output clustering has to satisfy all hard constraints while minimizing the number of violated preferences. Constrained Correlation Clustering is APX-Hard and has been approximated within a factor 3 by van Zuylen et al. [SODA '07] using $Omega(n^{3omega})$ time. In this work, using a more combinatorial approach, we show how to approximate this problem significantly faster at the cost of a slightly weaker approximation factor. In particular, our algorithm runs in $widetilde{O}(n^3)$ time and approximates Constrained Correlation Clustering within a factor 16. To achieve our result we need properties guaranteed by a particular influential algorithm for (unconstrained) Correlation Clustering, the CC-PIVOT algorithm. This algorithm chooses a pivot node $u$, creates a cluster containing $u$ and all its preferred nodes, and recursively solves the rest of the problem. As a byproduct of our work, we provide a derandomization of the CC-PIVOT algorithm that still achieves the 3-approximation; furthermore, we show that there exist instances where no ordering of the pivots can give a $(3-varepsilon)$-approximation, for any constant $varepsilon$. Finally, we introduce a node-weighted version of Correlation Clustering, which can be approximated within factor 3 using our insights on Constrained Correlation Clustering.