Fair Bayesian Model-Based Clustering

📅 2025-06-15

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Fair clustering aims to mitigate imbalanced representation of sensitive groups across clusters; however, existing approaches—such as fairness-aware k-means variants—rely on prespecified numbers of clusters and predefined distance metrics, limiting their applicability to heterogeneous data. This paper introduces the first Bayesian fair clustering framework. It employs a novel fairness-aware prior distribution that enables automatic inference of the number of clusters and accommodates arbitrary data types—including categorical variables—without requiring explicit distance functions or fixed cluster counts. Posterior inference is performed via Markov Chain Monte Carlo (MCMC) sampling. Experiments on real-world datasets demonstrate that the method automatically identifies an appropriate number of clusters, achieves utility–fairness trade-offs competitive with state-of-the-art methods, and significantly outperforms existing techniques in categorical-data settings.

Technology Category

Application Category

📝 Abstract

Fair clustering has become a socially significant task with the advancement of machine learning technologies and the growing demand for trustworthy AI. Group fairness ensures that the proportions of each sensitive group are similar in all clusters. Most existing group-fair clustering methods are based on the $K$-means clustering and thus require the distance between instances and the number of clusters to be given in advance. To resolve this limitation, we propose a fair Bayesian model-based clustering called Fair Bayesian Clustering (FBC). We develop a specially designed prior which puts its mass only on fair clusters, and implement an efficient MCMC algorithm. Advantages of FBC are that it can infer the number of clusters and can be applied to any data type as long as the likelihood is defined (e.g., categorical data). Experiments on real-world datasets show that FBC (i) reasonably infers the number of clusters, (ii) achieves a competitive utility-fairness trade-off compared to existing fair clustering methods, and (iii) performs well on categorical data.

Problem

Research questions and friction points this paper is trying to address.

Ensures group fairness in clustering by balancing sensitive group proportions

Overcomes limitations of predefined cluster numbers and distance metrics

Handles diverse data types including categorical through Bayesian inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fair Bayesian clustering with specialized prior

Efficient MCMC algorithm implementation

Infers cluster number and handles diverse data

🔎 Similar Papers

Interpretable Clustering: A Survey