Kempe Swap K-Means: A Scalable Near-Optimal Solution for Semi-Supervised Clustering

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses semi-supervised clustering under strict must-link (ML) and cannot-link (CL) constraints, aiming to balance solution quality and scalability. The authors propose a centroid-based two-stage heuristic algorithm that first employs Kempe chain exchange to optimize cluster assignments within the constrained space and then iteratively refines high-quality centroids, augmented by a controlled perturbation strategy to enhance global search capability. Notably, this work is the first to apply Kempe chain exchange to constrained clustering, effectively avoiding local optima while maintaining a high constraint satisfaction rate. Experimental results demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods on large-scale datasets, achieving superior performance in both clustering accuracy and computational efficiency.
📝 Abstract
This paper presents a novel centroid-based heuristic algorithm, termed Kempe Swap K-Means, for constrained clustering under rigid must-link (ML) and cannot-link (CL) constraints. The algorithm employs a dual-phase iterative process: an assignment step that utilizes Kempe chain swaps to refine current clustering in the constrained solution space and a centroid update step that computes optimal cluster centroids. To enhance global search capabilities and avoid local optima, the framework incorporates controlled perturbations during the update phase. Empirical evaluations demonstrate that the proposed method achieves near-optimal partitions while maintaining high computational efficiency and scalability. The results indicate that Kempe Swap K-Means consistently outperforms state-of-the-art benchmarks in both clustering accuracy and algorithmic efficiency for large-scale datasets.
Problem

Research questions and friction points this paper is trying to address.

constrained clustering
semi-supervised clustering
must-link constraints
cannot-link constraints
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kempe chain swap
constrained clustering
semi-supervised clustering
centroid-based heuristic
scalable optimization
🔎 Similar Papers
No similar papers found.
Yuxuan Ren
Yuxuan Ren
Fudan University
Optical ManipulationNonlinear OpticsMedical Image AnalysisDeep LearningBeam shaping
S
Shijie Deng
School of Industrial and Systems Engineering, Georgia Institute of Technology, USA