Improved Streaming Algorithm for Fair $k$-Center Clustering

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This paper studies the fair $k$-center clustering problem over data streams, where the input comprises $m$ sensitive groups, each subject to an upper bound on the number of selected centers to ensure representativeness. We propose the first single-pass streaming algorithm for this problem. Its core innovation is the construction of a $lambda$-independent center set—introduced here for the first time in streaming settings—and formulating center selection as a constrained vertex cover problem. Our algorithm achieves a tight 5-approximation ratio with memory complexity $O(k log n)$. An offline variant attains a 3-approximation, matching the current state-of-the-art. We further extend the framework to semi-structured streams and multi-group batch-arrival settings, designing efficient batch-processing strategies. Experiments demonstrate that our method significantly outperforms existing baselines in both clustering quality and runtime efficiency, offering strong theoretical guarantees and practical scalability.

Technology Category

Application Category

📝 Abstract

Many real-world applications pose challenges in incorporating fairness constraints into the $k$-center clustering problem, where the dataset consists of $m$ demographic groups, each with a specified upper bound on the number of centers to ensure fairness. Focusing on big data scenarios, this paper addresses the problem in a streaming setting, where data points arrive one by one sequentially in a continuous stream. Leveraging a structure called the $λ$-independent center set, we propose a one-pass streaming algorithm that first computes a reserved set of points during the streaming process. Then, for the post-streaming process, we propose an approach for selecting centers from the reserved point set by analyzing all three possible cases, transforming the most complicated one into a specially constrained vertex cover problem in an auxiliary graph. Our algorithm achieves a tight approximation ratio of 5 while consuming $O(klog n)$ memory. It can also be readily adapted to solve the offline fair $k$-center problem, achieving a 3-approximation ratio that matches the current state of the art. Furthermore, we extend our approach to a semi-structured data stream, where data points from each group arrive in batches. In this setting, we present a 3-approximation algorithm for $m = 2$ and a 4-approximation algorithm for general $m$. Lastly, we conduct extensive experiments to evaluate the performance of our approaches, demonstrating that they outperform existing baselines in both clustering cost and runtime efficiency.

Problem

Research questions and friction points this paper is trying to address.

Develops streaming algorithm for fair k-center clustering

Addresses fairness constraints with demographic group bounds

Achieves tight approximation ratios with low memory usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-pass streaming algorithm using lambda-independent center sets

Transforms complex case to constrained vertex cover problem

Achieves tight approximation ratios with logarithmic memory usage

🔎 Similar Papers

Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering