🤖 AI Summary
This study addresses fairness deficiencies in machine learning–based early warning systems used by higher education institutions for allocating student support resources, particularly with respect to disparities arising from gender, age, and residency status. Through a long-term collaboration with Centennial College, the authors replicate the institution’s deployed system and develop the first reproducible auditing framework that integrates construct validity with statistical fairness metrics to systematically evaluate the entire pipeline—from data collection and prediction to post-processing. Their analysis reveals that younger, male, and international students are systematically assigned higher risk scores than their actual risk levels warrant, while older and female students with equivalent risk profiles are consistently underestimated. Notably, bias is significantly amplified during the post-processing stage. This work provides both methodological innovation and empirical evidence to advance fairness auditing of institutionalized machine learning systems.
📝 Abstract
Fairness audits of institutional risk models are critical for understanding how deployed machine learning pipelines allocate resources. Drawing on multi-year collaboration with Centennial College, where our prior ethnographic work introduced the ASP-HEI Cycle, we present a replica-based audit of a deployed Early Warning System (EWS), replicating its model using institutional training data and design specifications. We evaluate disparities by gender, age, and residency status across the full pipeline (training data, model predictions, and post-processing) using standard fairness metrics. Our audit reveals systematic misallocation: younger, male, and international students are disproportionately flagged for support, even when many ultimately succeed, while older and female students with comparable dropout risk are under-identified. Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers. This work provides a replicable methodology for auditing institutional ML systems and shows how disparities emerge and compound across stages, highlighting the importance of evaluating construct validity alongside statistical fairness. It contributes one empirical thread to a broader program investigating algorithms, student data, and power in higher education.