๐ค AI Summary
This work addresses the amplification of annotator biases toward sensitive attributes in crowdsourced label aggregation, a problem exacerbated by the lack of theoretical guarantees and effective fairness constraints in existing methods. Under the ฮต-fairness framework, we analyze mainstream aggregation approaches, derive an upper bound on the fairness of majority voting, and prove that it converges exponentially to the fairness of the true labels under reasonable conditions. We further extend, for the first time, continuous-domain multi-class fair post-processing algorithms to the discrete setting to strictly satisfy demographic parity constraints. Experiments on both synthetic and real-world datasets demonstrate that our method significantly improves fairness while maintaining high accuracy, thereby bridging the gap in theoretical understanding and discrete post-processing techniques for fairness in crowdsourced label aggregation.
๐ Abstract
As acquiring reliable ground-truth labels is usually costly, or infeasible, crowdsourcing and aggregation of noisy human annotations is the typical resort. Aggregating subjective labels, though, may amplify individual biases, particularly regarding sensitive features, raising fairness concerns. Nonetheless, fairness in crowdsourced aggregation remains largely unexplored, with no existing convergence guarantees and only limited post-processing approaches for enforcing $\varepsilon$-fairness under demographic parity. We address this gap by analyzing the fairness s of crowdsourced aggregation methods within the $\varepsilon$-fairness framework, for Majority Vote and Optimal Bayesian aggregation. In the small-crowd regime, we derive an upper bound on the fairness gap of Majority Vote in terms of the fairness gaps of the individual annotators. We further show that the fairness gap of the aggregated consensus converges exponentially fast to that of the ground-truth under interpretable conditions. Since ground-truth itself may still be unfair, we generalize a state-of-the-art multiclass fairness post-processing algorithm from the continuous to the discrete setting, which enforces strict demographic parity constraints to any aggregation rule. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and corroborate the theoretical insights.