🤖 AI Summary
This paper studies correlation clustering on general graphs: given an incomplete undirected graph with edges labeled ±, the goal is to partition vertices to minimize the number of inconsistent edges (i.e., negative edges within clusters plus positive edges across clusters). We consider the parameterized setting where the input graph becomes complete after deleting at most $k$ vertices. For the first time, we present an FPT constant-factor approximation algorithm for this problem on general graphs—overcoming a known theoretical barrier that ruled out constant-factor approximations for non-complete graphs. Our method integrates structural graph preprocessing, a constant-factor approximation subroutine for complete graphs, and enumeration combined with dynamic programming. The algorithm runs in time $2^{O(k^3)} cdot mathrm{poly}(n)$ and applies to general graphs of arbitrary sparsity. This result achieves both theoretical breakthrough—establishing the first FPT constant approximation for correlation clustering beyond complete graphs—and broad applicability.
📝 Abstract
The Correlation Clustering problem is one of the most extensively studied clustering formulations due to its wide applications in machine learning, data mining, computational biology and other areas. We consider the Correlation Clustering problem on general graphs, where given an undirected graph (maybe not complete) with each edge being labeled with $langle +
angle$ or $langle -
angle$, the goal is to partition the vertices into clusters to minimize the number of the disagreements with the edge labeling: the number of $langle -
angle$ edges within clusters plus the number of $langle +
angle$ edges between clusters. Hereby, a $langle +
angle$ (or $langle -
angle$) edge means that its end-vertices are similar (or dissimilar) and should belong to the same cluster (or different clusters), and ``missing'' edges are used to denote that we do not know if those end-vertices are similar or dissimilar. Correlation Clustering is NP-hard, even if the input graph is complete, and Unique-Games hard to obtain polynomial-time constant approximation on general graphs. With a complete graph as input, Correlation Clustering admits a $(1.994+varepsilon )$-approximation. We investigate Correlation Clustering on general graphs from the perspective of parameterized approximability. We set the parameter $k$ as the minimum number of vertices whose removal results in a complete graph, and obtain the first FPT constant-factor approximation for Correlation Clustering on general graphs which runs in $2^{O(k^3)} cdot extrm{poly}(n)$ time.