🤖 AI Summary
This paper studies approximation algorithms for correlation clustering: given a complete graph with ± edge labels, the goal is to minimize the number of misclassified edges—i.e., positive edges across clusters and negative edges within clusters. We introduce the first unified and expressive “cluster linear program” (cluster LP), which subsumes and strengthens all prior LP relaxations. This LP admits a (1+ε)-approximate solution in polynomial time and enables a concise, robust rounding scheme that avoids error accumulation inherent in iterative approaches. Our theoretical contributions include: (i) the first exact characterization of the cluster LP’s integrality gap as 4/3; (ii) a 1.437-approximation ratio—verified via semidefinite programming—breaking long-standing barriers of 1.73 and 2.06; (iii) the first explicit NP-hardness lower bound of 24/23 ≈ 1.042; and (iv) a PTAS-level algorithm built upon the Sherali–Adams hierarchy and the round-or-cut framework.
📝 Abstract
In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla (FOCS 2002), the input is a complete graph where edges are labeled either + or −, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev (STOC 2015) gave a 2.06-approximation by providing a near-optimal rounding of the standard LP, and Cohen-Addad, Lee, Li, and Newman (FOCS 2022, 2023) finally bypassed the integrality gap of 2 for this LP giving a 1.73-approximation for the problem. While introducing new ideas for Correlation Clustering, their algorithm is more complicated than typical approximation algorithms in the following two aspects: (1) It is based on two different relaxations with separate rounding algorithms connected by the round-or-cut procedure. (2) Each of the rounding algorithms has to separately handle seemingly inevitable correlated rounding errors, coming from correlated rounding of Sherali-Adams and other strong LP relaxations. In order to create a simple and unified framework for Correlation Clustering similar to those for typical approximate optimization tasks, we propose the cluster LP as a strong linear program that might tightly capture the approximability of Correlation Clustering. It unifies all the previous relaxations for the problem. It is exponential-sized, but we show that it can be (1+є)-approximately solved in polynomial time for any є > 0, providing the framework for designing rounding algorithms without worrying about correlated rounding errors; these errors are handled uniformly in solving the relaxation. We demonstrate the power of the cluster LP by presenting a simple rounding algorithm, and providing two analyses, one analytically proving a 1.49-approximation and the other solving a factor-revealing SDP to show a 1.437-approximation. Both proofs introduce principled methods by which to analyze the performance of the algorithm, resulting in a significantly improved approximation guarantee. Finally, we prove an integrality gap of 4/3 for the cluster LP, showing our 1.437-upper bound cannot be drastically improved. Our gap instance directly inspires an improved NP-hardness of approximation with a ratio 24/23 ≈ 1.042; no explicit hardness ratio was known before.