🤖 AI Summary
This study investigates the impact of expanding the historical training window on predictive performance and algorithmic fairness under temporal distribution shift—including covariate and concept drift. Using simulation experiments and empirical student retention prediction across multi-institution, multi-year educational datasets, we find that the common assumption “more data is better” fails under concept-drift-dominant regimes: extending the training window degrades overall accuracy and exacerbates fairness disparities across sociodemographic groups—particularly when marginalized subpopulations experience heterogeneous concept drift, leading to nonlinear bias amplification. Our key contributions are: (i) identifying concept drift—not covariate drift—as the primary driver of performance degradation; and (ii) the first systematic characterization of its nonlinear, fairness-amplifying mechanism. These findings provide both theoretical grounding and practical guidance for model update strategies in dynamic, real-world deployment environments.
📝 Abstract
Predictive models are typically trained on historical data to predict future outcomes. While it is commonly assumed that training on more historical data would improve model performance and robustness, data distribution shifts over time may undermine these benefits. This study examines how expanding historical data training windows under covariate shifts (changes in feature distributions) and concept shifts (changes in feature-outcome relationships) affects the performance and algorithmic fairness of predictive models. First, we perform a simulation study to explore scenarios with varying degrees of covariate and concept shifts in training data. Absent distribution shifts, we observe performance gains from longer training windows though they reach a plateau quickly; in the presence of concept shift, performance may actually decline. Covariate shifts alone do not significantly affect model performance, but may complicate the impact of concept shifts. In terms of fairness, models produce more biased predictions when the magnitude of concept shifts differs across sociodemographic groups; for intersectional groups, these effects are more complex and not simply additive. Second, we conduct an empirical case study of student retention prediction, a common machine learning application in education, using 12 years of student records from 23 minority-serving community colleges in the United States. We find concept shifts to be a key contributor to performance degradation when expanding the training window. Moreover, model fairness is compromised when marginalized populations have distinct data distribution shift patterns from their peers. Overall, our findings caution against conventional wisdom that "more data is better" and underscore the importance of using historical data judiciously, especially when it may be subject to data distribution shifts, to improve model performance and fairness.