🤖 AI Summary
This paper addresses systemic bias against protected groups—such as race, gender, and age—in predictive algorithms. To this end, we introduce DSLD, the first open-source toolkit integrating statistical modeling, causal inference, and legal compliance perspectives, available for both R and Python. Methodologically, DSLD unifies confounder identification, bias quantification (e.g., disparate impact, equalized odds), counterfactual debiasing, and interactive visualization. It is accompanied by an 80-page Quarto-based pedagogical manual that bridges theory, implementation, and real-world discrimination case studies. Our key contribution is a transparent, reproducible, interdisciplinary fairness analysis workflow—designed for auditability and regulatory alignment—that has been successfully deployed in university statistics instruction and public policy evaluation. Empirical results demonstrate significant improvements in learners’ ability to empirically detect algorithmic bias and in legal practitioners’ capacity to interpret technical fairness assessments.
📝 Abstract
The growing power of data science can play a crucial role in addressing social discrimination, necessitating nuanced understanding and effective mitigation strategies for biases."Data Science Looks At Discrimination"(DSLD) is an R and Python package designed to provide users with a comprehensive toolkit of statistical and graphical methods for assessing possible discrimination related to protected groups such as race, gender, and age. The package addresses critical issues by identifying and mitigating confounders and reducing bias against protected groups in prediction algorithms. In educational settings, DSLD offers instructors powerful tools to teach statistical principles through motivating real world examples of discrimination analysis. The inclusion of an 80 page Quarto book further supports users from statistics educators to legal professionals in effectively applying these analytical tools to real world scenarios.