Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge

📅 2024-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the $k$-center clustering problem with instance-level must-link and cannot-link constraints—a strongly NP-hard problem that suffers from both computational intractability and a tight approximation lower bound exceeding 2 due to constraint coupling. To address this, we propose the first efficient approximation algorithm with a provably optimal approximation ratio of exactly 2. Our method integrates reverse dominating set construction, integer programming relaxation, and LP duality analysis to ensure solution feasibility and approximation tightness. Experiments on multiple real-world datasets demonstrate that our algorithm significantly reduces clustering cost (average improvement of 12.7%), increases silhouette coefficient (+0.15), and accelerates runtime by over 40% compared to state-of-the-art baselines. Thus, our approach achieves both theoretical optimality and practical efficiency, overcoming long-standing trade-offs between approximation quality and scalability.

Technology Category

Application Category

📝 Abstract
Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work, we build on widely adopted $k$-center clustering and model its input background knowledge as must-link (ML) and cannot-link (CL) constraint sets. However, most clustering problems including $k$-center are inherently $mathcal{NP}$-hard, while the more complex constrained variants are known to suffer severer approximation and computation barriers that significantly limit their applicability. By employing a suite of techniques including reverse dominating sets, linear programming (LP) integral polyhedron, and LP duality, we arrive at the first efficient approximation algorithm for constrained $k$-center with the best possible ratio of 2. We also construct competitive baseline algorithms and empirically evaluate our approximation algorithm against them on a variety of real datasets. The results validate our theoretical findings and demonstrate the great advantages of our algorithm in terms of clustering cost, clustering quality, and running time.
Problem

Research questions and friction points this paper is trying to address.

Develop efficient algorithm for constrained k-center clustering
Overcome NP-hardness and approximation barriers in clustering
Incorporate must-link and cannot-link constraints to improve results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reverse dominating sets for approximation
Applies linear programming integral polyhedron
Leverages LP duality for optimal ratio
🔎 Similar Papers
Longkun Guo
Longkun Guo
Fuzhou university
Algorithm design and analysisdata scienceschedulingmobile networks
C
Chaoqi Jia
School of Accounting, Information Systems and Supply Chain, RMIT University, Australia
Kewen Liao
Kewen Liao
Associate Professor, School of IT, Deakin University
AlgorithmsData AnalyticsMachine Learning
Z
Zhigang Lu
College of Science and Engineering, James Cook University, Australia
M
Minhui Xue
CSIRO’s Data61, Australia