🤖 AI Summary
Existing fair clustering methods often suffer from utility degradation and numerical instability due to stringent fairness constraints or approximate optimization.
Method: We propose a two-stage alternating optimization framework that imposes no explicit fairness constraints: Stage I achieves demographic proportionality via joint distribution alignment inspired by the Wasserstein distance; Stage II performs refined K-means centroid updates.
Contribution/Results: We theoretically establish that our method guarantees near-optimal clustering utility at any prescribed fairness level while eliminating numerical instabilities such as gradient explosion. On multiple benchmark datasets, it is the first method to attain the Pareto frontier between fairness and clustering utility—achieving near-perfect demographic fairness without optimization divergence and significantly outperforming state-of-the-art approaches.
📝 Abstract
Algorithmic fairness in clustering aims to balance the proportions of instances assigned to each cluster with respect to a given sensitive attribute. While recently developed fair clustering algorithms optimize clustering objectives under specific fairness constraints, their inherent complexity or approximation often results in suboptimal clustering utility or numerical instability in practice. To resolve these limitations, we propose a new fair clustering algorithm based on a novel decomposition of the fair K-means clustering objective function. The proposed algorithm, called Fair Clustering via Alignment (FCA), operates by alternately (i) finding a joint probability distribution to align the data from different protected groups, and (ii) optimizing cluster centers in the aligned space. A key advantage of FCA is that it theoretically guarantees approximately optimal clustering utility for any given fairness level without complex constraints, thereby enabling high-utility fair clustering in practice. Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability.