🤖 AI Summary
This paper addresses fairness evaluation of multiclass classifiers under the equalized odds criterion. We propose an interpretable and quantifiable unfairness metric. Methodologically, we are the first to rigorously generalize equalized odds to the multiclass setting; we compute unfairness exactly via conditional confusion matrices and, when only aggregate statistics (e.g., per-class accuracies or confusion distributions) are available, derive a theoretically grounded lower bound to enable black-box auditing. Our contributions are threefold: (1) moving beyond coarse binary fair/unfair judgments to fine-grained, continuous quantification of unfairness; (2) requiring no access to raw data or model internals; and (3) providing provably tight lower-bound estimates with formal theoretical guarantees. We validate the effectiveness of our approach across multiple real-world datasets and publicly release all code and experimental materials.
📝 Abstract
We propose a new interpretable measure of unfairness, that allows providing a quantitative analysis of classifier fairness, beyond a dichotomous fair/unfair distinction. We show how this measure can be calculated when the classifier's conditional confusion matrices are known. We further propose methods for auditing classifiers for their fairness when the confusion matrices cannot be obtained or even estimated. Our approach lower-bounds the unfairness of a classifier based only on aggregate statistics, which may be provided by the owner of the classifier or collected from freely available data. We use the equalized odds criterion, which we generalize to the multiclass case. We report experiments on data sets representing diverse applications, which demonstrate the effectiveness and the wide range of possible uses of the proposed methodology. An implementation of the procedures proposed in this paper and as the code for running the experiments are provided in https://github.com/sivansabato/unfairness.