🤖 AI Summary
In auditing the individual fairness of U.S. recidivism risk assessment (RRA) tools, a critical operational challenge arises: which demographic attributes should be included in the similarity metric for fairness evaluation? Method: This study conducts the first controlled human-subject experiment—integrating statistical significance testing (two-sided t-tests) with individual-level similarity function modeling—to empirically assess attribute relevance in fairness judgments. Contribution/Results: Age and gender exhibit statistically significant effects on individual fairness assessments (p < 0.01), warranting their inclusion in the similarity metric; race shows no significant effect (p = 0.32) and should be excluded. These findings bridge the operational gap between legal principles—such as anti-discrimination mandates—and technical fairness practice—specifically, the design of individual fairness metrics—by providing the first empirically grounded, deployable guideline for sensitive attribute selection in RRA tool audits.
📝 Abstract
Despite its U.S. constitutional foundation, the technical ``individual fairness'' criterion has not been operationalized in state or federal statutes/regulations. We conduct a human subjects experiment to address this gap, evaluating which demographic features are relevant for individual fairness evaluation of recidivism risk assessment (RRA) tools. Our analyses conclude that the individual similarity function should consider age and sex, but it should ignore race.