Min-Max Correlation Clustering via Neighborhood Similarity

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This paper studies the min-max correlation clustering problem on complete graphs with ± edge labels: partition vertices into clusters to minimize the ℓ∞-norm of each vertex’s disagreement vector—the number of positive/negative edge violations with respect to each cluster. We present the first (3+ε)-approximation algorithm, breaking the prior 4-approximation barrier. Leveraging a structural insight—local similarity among optimal neighborhoods—we design a purely combinatorial algorithm enabling neighborhood-query-driven linear-time computation. Furthermore, we initiate the study of this problem in modern large-scale computational models: in the Massively Parallel Computation (MPC) model, our algorithm uses sublinear memory per machine and terminates in O(1) rounds; in the semi-streaming model, it achieves Õ(|V|) space complexity. The overall time complexity is Õ(|E⁺|), significantly improving upon previous methods.

Technology Category

Application Category

📝 Abstract

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the $ell_{infty}$-norm of the disagreement vector over all vertices. We resolve this problem with an efficient $(3 + epsilon)$-approximation algorithm that runs in nearly linear time, $ ilde{O}(|E^+|)$, where $|E^+|$ denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in $O(|V|^2 + |V| D^2)$ time, where $|V|$ is the number of nodes and $D$ is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes $O(1)$ rounds. In the streaming model, our algorithm requires only $ ilde{O}(|V|)$ space, where $|V|$ is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a $(3 + epsilon)$-approximation algorithm using $O(|E^+|)$ neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

Problem

Research questions and friction points this paper is trying to address.

Minimize disagreement in graph clustering

Develop efficient approximation algorithm

Extend algorithm to parallel and streaming models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient $(3 + ε)$-approximation algorithm

Extends to MPC and semi-streaming models

Uses neighborhood similarity and random projection

🔎 Similar Papers

Hierarchical Correlation Clustering and Tree Preserving Embedding

2020-02-18Computer Vision and Pattern RecognitionCitations: 6

Bosch Group

Hildesheim, NDS, DE

Master Thesis Automated Scalable Deployment of Predictive Maintenance in Cloud

Bosch Group

Stuttgart, Germany

Machine Learning Engineer