Relative Error Fair Clustering in the Weak-Strong Oracle Model

📅 2025-06-14

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This paper studies fair $k$-median clustering under a weak–strong dual-oracle distance model: a weak oracle provides low-cost but biased distance estimates, while a strong oracle yields exact but expensive distance queries. The goal is to achieve a $(1+varepsilon)$-approximation to the optimal fair solution using an extremely small number of strong queries. To this end, we propose the first $(1+varepsilon)$-accurate fair coreset for general $(k,z)$-clustering—namely, a fair $k$-median coreset with rigorous theoretical guarantees. Our method integrates weighted sampling, weak-distance calibration, and adaptive strong-query selection. It requires only $mathrm{poly}(k/varepsilon cdot log n)$ strong queries to attain the desired approximation guarantee. Moreover, the resulting coreset size improves upon the previous best bound by more than an order of magnitude—achieving a constant-factor reduction in size.

Technology Category

Application Category

📝 Abstract

We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness with the presence of inaccurate information. We achieve the first $(1+varepsilon)$-coresets for fair $k$-median clustering using $ ext{poly}left(frac{k}{varepsilon}cdotlog n ight)$ queries to the strong oracle. Furthermore, our results imply coresets for the standard setting (without fairness constraints), and we could in fact obtain $(1+varepsilon)$-coresets for $(k,z)$-clustering for general $z=O(1)$ with a similar number of strong oracle queries. In contrast, previous results achieved a constant-factor $(>10)$ approximation for the standard $k$-clustering problems, and no previous work considered the fair $k$-median clustering problem.

Problem

Research questions and friction points this paper is trying to address.

Minimize strong oracle queries for fair clustering

Balance accuracy and cost in distance measurements

Achieve near-optimal fairness with inaccurate data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines strong and weak oracle distance queries

Achieves near-optimal fair clustering efficiently

First (1+ε)-coresets for fair k-median

🔎 Similar Papers

Interpretable Clustering: A Survey