🤖 AI Summary
Social media toxicity detection suffers from unfair performance disparities across demographic groups, with existing methods lacking explicit fairness objectives for multi-group accuracy parity.
Method: This paper introduces Accuracy Parity (AP) as a novel fairness criterion—requiring uniform classification accuracy across all protected demographic groups—and proposes the differentiable Group Accuracy Parity (GAP) loss function for end-to-end optimization. GAP is theoretically generalized to arbitrary numbers of demographic groups, replacing heuristic fairness constraints with a principled, gradient-based formulation.
Contribution/Results: We integrate GAP into a gradient-based neural training framework and empirically validate its efficacy. Experiments on real-world multi-group toxicity datasets demonstrate that GAP substantially reduces inter-group accuracy disparity, outperforming cross-entropy and other baselines in both fairness (measured by accuracy gap reduction) and overall performance (e.g., macro-F1, AUC). The method thus achieves simultaneous improvements in fairness and predictive utility without compromising model effectiveness.
📝 Abstract
In algorithmic toxicity detection pipelines, it is important to identify which demographic group(s) are the subject of a post, a task commonly known as extit{target (group) detection}. While accurate detection is clearly important, we further advocate a fairness objective: to provide equal protection to all groups who may be targeted. To this end, we adopt extit{Accuracy Parity} (AP) -- balanced detection accuracy across groups -- as our fairness objective. However, in order to align model training with our AP fairness objective, we require an equivalent loss function. Moreover, for gradient-based models such as neural networks, this loss function needs to be differentiable. Because no such loss function exists today for AP, we propose emph{Group Accuracy Parity} (GAP): the first differentiable loss function having a one-on-one mapping to AP. We empirically show that GAP addresses disparate impact on groups for target detection. Furthermore, because a single post often targets multiple groups in practice, we also provide a mathematical extension of GAP to larger multi-group settings, something typically requiring heuristics in prior work. Our findings show that by optimizing AP, GAP better mitigates bias in comparison with other commonly employed loss functions.