🤖 AI Summary
This paper studies sublinear-time approximation algorithms for correlation clustering on large-scale graphs. Addressing the high computational complexity of traditional LP relaxations—which struggle to balance accuracy and efficiency—we propose the first algorithm that computes a $(1+varepsilon)$-approximate feasible solution to the Cluster LP and simultaneously performs efficient rounding in $ ilde{O}(2^{mathrm{poly}(1/varepsilon)} n)$ time. Our approach integrates combinatorial optimization, double sampling, local graph exploration, and a novel LP rounding technique. The resulting algorithm achieves an approximation ratio of $(1.437+varepsilon)$, improving upon the prior best ratio while reducing runtime from the previous state-of-the-art $n^{mathrm{poly}(1/varepsilon)}$ to a truly sublinear function of $n$. This work bridges a fundamental theoretical gap in achieving both high accuracy and near-linear scalability for correlation clustering.
📝 Abstract
Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the Cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the Cluster LP in time O(n^{ ext{poly}(1/eps)}) with objective value at most (1+epsilon) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the Cluster LP, yielding a (1.437+eps)-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the Cluster LP with objective value at most (1+epsilon) of the optimum in time widetilde O(2^{ ext{poly}(1/eps)} n), where n is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.437+epsilon)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.