Label-consistent clustering for evolving data

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This paper addresses clustering stability under dynamic data streams by formally introducing the **label-consistent k-center problem**: given a point set (X), number of clusters (k), maximum allowed label changes (b), and a historical clustering (H), find a new clustering (C) that minimizes the k-center cost while ensuring at most (b) points change their cluster assignments relative to (H). To solve it, we propose two constant-factor approximation algorithms—greedy center replacement and constrained local search—both explicitly incorporating label consistency and supporting incremental updates. We prove that both algorithms achieve constant approximation ratios. Experiments on multiple real-world datasets demonstrate that our methods significantly outperform baselines, achieving high clustering quality while ensuring smooth solution evolution and strong consistency with prior clusterings.

Technology Category

Application Category

📝 Abstract

Data analysis often involves an iterative process, where solutions must be continuously refined in response to new data. Typically, as new data becomes available, an existing solution must be updated to incorporate the latest information. In addition to seeking a high-quality solution for the task at hand, it is also crucial to ensure consistency by minimizing drastic changes from previous solutions. Applying this approach across many iterations, ensures that the solution evolves gradually and smoothly. In this paper, we study the above problem in the context of clustering, specifically focusing on the $k$-center problem. More precisely, we study the following problem: Given a set of points $X$, parameters $k$ and $b$, and a prior clustering solution $H$ for $X$, our goal is to compute a new solution $C$ for $X$, consisting of $k$ centers, which minimizes the clustering cost while introducing at most $b$ changes from $H$. We refer to this problem as label-consistent $k$-center, and we propose two constant-factor approximation algorithms for it. We complement our theoretical findings with an experimental evaluation demonstrating the effectiveness of our methods on real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Develops algorithms for updating clustering solutions with new data

Ensures minimal changes to maintain consistency with prior clustering

Focuses on label-consistent k-center problem with bounded modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-consistent clustering with limited changes

Constant-factor approximation algorithms for k-center

Ensures smooth evolution with bounded solution updates

🔎 Similar Papers

No similar papers found.