Solving the Correlation Cluster LP in Sublinear Time

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies sublinear-time approximation algorithms for correlation clustering on large-scale graphs. Addressing the high computational complexity of traditional LP relaxations—which struggle to balance accuracy and efficiency—we propose the first algorithm that computes a $(1+varepsilon)$-approximate feasible solution to the Cluster LP and simultaneously performs efficient rounding in $ ilde{O}(2^{mathrm{poly}(1/varepsilon)} n)$ time. Our approach integrates combinatorial optimization, double sampling, local graph exploration, and a novel LP rounding technique. The resulting algorithm achieves an approximation ratio of $(1.437+varepsilon)$, improving upon the prior best ratio while reducing runtime from the previous state-of-the-art $n^{mathrm{poly}(1/varepsilon)}$ to a truly sublinear function of $n$. This work bridges a fundamental theoretical gap in achieving both high accuracy and near-linear scalability for correlation clustering.

Technology Category

Application Category

📝 Abstract
Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the Cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the Cluster LP in time O(n^{ ext{poly}(1/eps)}) with objective value at most (1+epsilon) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the Cluster LP, yielding a (1.437+eps)-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the Cluster LP with objective value at most (1+epsilon) of the optimum in time widetilde O(2^{ ext{poly}(1/eps)} n), where n is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.437+epsilon)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.
Problem

Research questions and friction points this paper is trying to address.

Efficiently solving exponential-size Cluster LP for Correlation Clustering
Developing sublinear-time approximation algorithm for Correlation Clustering
Bridging gap between accuracy and speed in clustering methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sublinear time feasible solution for Cluster LP
Fast (1.437+ε)-approximation algorithm implementation
Exponential size LP handled efficiently