A Generic Framework for Fair Consensus Clustering in Streams

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the problem of consensus clustering in streaming multi-agent environments under strict memory constraints, where both fairness and clustering quality must be preserved. The paper presents the first constant-factor approximation algorithm for this setting, requiring storage of only $O(\log n)$ input clusterings—dramatically reducing memory overhead. A key contribution is a general-purpose framework that decouples fairness considerations from the core clustering mechanism, enabling seamless integration of existing fair clustering and clustering ensemble techniques. The proposed algorithm guarantees a theoretical approximation ratio in the streaming model and naturally extends to offline settings as well as to k-median consensus clustering, demonstrating both rigorous theoretical guarantees and practical efficacy.

Technology Category

Application Category

📝 Abstract

Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.

Problem

Research questions and friction points this paper is trying to address.

fair consensus clustering

streaming model

memory-limited

proportionate fairness

k-median consensus clustering

Innovation

Methods, ideas, or system contributions that make the work stand out.

fair consensus clustering

streaming algorithm

constant-factor approximation