Fair Diversity Maximization with Few Representatives

๐Ÿ“… 2025-06-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper studies the fairness-constrained diversity maximization problem: selecting $k$ representative points from a multi-label dataset such that at most a constant number of samples is chosen from each label class, while maximizing the minimum pairwise distance. For small-scale cross-group representative selection, we propose the first three-stage frameworkโ€”pre-pruning, padded decomposition, and label-aware assignment. We introduce a distance-driven pre-pruning step to improve efficiency; employ randomized padded decomposition to balance intra-group sparsity and global diversity; and design a label-aware cluster assignment mechanism to ensure fair label coverage. We theoretically establish an approximation ratio of $Omega(sqrt{log m}/m)$, significantly improving upon prior methods. Experiments on large-scale datasets demonstrate that our approach achieves state-of-the-art performance in both minimum pairwise distance and label coverage, while strictly satisfying fairness constraints.

Technology Category

Application Category

๐Ÿ“ Abstract
Diversity maximization problem is a well-studied problem where the goal is to find $k$ diverse items. Fair diversity maximization aims to select a diverse subset of $k$ items from a large dataset, while requiring that each group of items be well represented in the output. More formally, given a set of items with labels, our goal is to find $k$ items that maximize the minimum pairwise distance in the set, while maintaining that each label is represented within some budget. In many cases, one is only interested in selecting a handful (say a constant) number of items from each group. In such scenario we show that a randomized algorithm based on padded decompositions improves the state-of-the-art approximation ratio to $sqrt{log(m)}/(3m)$, where $m$ is the number of labels. The algorithms work in several stages: ($i$) a preprocessing pruning which ensures that points with the same label are far away from each other, ($ii$) a decomposition phase, where points are randomly placed in clusters such that there is a feasible solution with maximum one point per cluster and that any feasible solution will be diverse, $(iii)$ assignment phase, where clusters are assigned to labels, and a representative point with the corresponding label is selected from each cluster. We experimentally verify the effectiveness of our algorithm on large datasets.
Problem

Research questions and friction points this paper is trying to address.

Maximize diversity while ensuring fair group representation
Select diverse items with label-specific budget constraints
Improve approximation ratio for few representatives per group
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preprocessing pruning for label distance
Randomized padded decomposition clustering
Cluster-label assignment with diversity
๐Ÿ”Ž Similar Papers
No similar papers found.